Sunday, January 7, 2007

Interactive Search: A Case for Semantics

Are you happy with the quality of web search engines?


Try searching for "mouse" in your favorite search engine. You are most likely to get a good mix of hits that are about rodents, the computer mouse, a rock band and a few other senses of the word mouse. Instead of just throwing all those hits at you, wouldn't it be nice if the search engine came back to you and asked "Which kind of mouse do you mean?" Isn't this what a human assistant would do?


What does it take for a search engine to engage users in such interactive dialogues? First of all, it must know that there are different meanings for "mouse." It must not treat the "keyword" entered by the user merely as a juxtaposition of the characters "m" "o" "u" "s" "e." Rather, it must have a dictionary entry for the word "mouse" that maps it to multiple concepts in its ontology: one for the rodent mouse, another for the computer peripheral known as the mouse, and so on.


In other words, the search engine needs an ontology with concepts (or nodes) for the above kinds of mice. In addition, it also needs a lexicon that maps words or phrases (like "mouse") to one or more concepts in the ontology. One of challenges is who is going to build this ontology? Can we get all human beings to agree on one ontology, one view of the world, and one way of classifying all concepts?


Even if we have an ontology and a lexicon, figuring out the intended sense of the word (a process called disambiguation) is not an easy task. In the case of a search engine where there is hardly any textual or conversational context available, we may have to exploit knowledge of the user's profile to guess the intended meaning.


Or else, ask the user.


In any case, as long as search engines continue to ignore meanings, they will keep throwing too many hits at users to whom many of them are irrelevant.


The problem of diambiguation is in fact more complex. For example, if a user searches for "dog" instead of "mouse," the search engine must not throw a dialogue box at the user asking "Which type of dog do you mean? Alsatian, poodle, dalmatian, ..." This would be seen as a silly thing to ask. In fact the expectation is that pages that are about any of these or other types of dogs are relevant to the query. The reason is that these are all (sub)types of the same concept of a dog (other than "hot dog", etc.) whereas there is no single concept of mouse that subsumes rodents and computer mice. At least, most peolpe do not view the world that way... wherein the shape of the entity makes it a mouse and then you classify it into biological mice and computer mice.


In summary, we need search engines that are capable of engaging the user in an effective dialogue to improve the precision and recall of a search. They should also be able to figure out when to throw a question at the user and when to simply return hits.

No comments: