One of my favorite topics – and I hope one that I have not beaten to death quite yet – is the difference between “objects” and “referents” and the problem that poses for search engines – it is one that current search technology is not set up to recognize – and one to which the idea of “relevance” does not apply. To summarize its effects, it means that when I put the name of an individual into a search box, I might get a Wiki page – which will single out a particular individual in the world and tell me “who” that individual “is” – or I will get pages about any individual with that same name in all sorts of different contexts. And these sorts of results may be relevant but the point I made was that they often fail to be “useful”. So this raises the question of what would be useful. Here is where I stopped by suggesting that solving the “object”/”referent” distinction may not really help because of the many “properties” attributed to individuals. These “properties”, amounting in most cases to what an individual’s “role” is in all kinds of situations, actually begin to define at least what the search engine needs to distinguish one individual from another. So even if we were to strip out all data (such as Wiki) that supplies definition-style information, and were left with all-and-only pages relevant to the string “g-e-o-r-g-e b-u-s-h”, we still would not find that each one of equal relevance was equally useful!

OK, now, some readers who are emotionally attached to physical being will have to risk joining me in the parallel universe of the search engine for a moment. What I am saying is that these “properties” need to be recognized in order to satisfy both utility and relevance at the same time. Why? Let’s give “George Bush” results a closer look. As an individual, Mr. Bush has many properties. He is a former president of the United States and he is a member of the Republican Party. He also, evidently, is “funny”. This latter property “IS_FUNNY” turns out to be quite a salient attribute to the search engine. Of course, the search engine doesn’t know that the properties “IS_FUNNY”, “IS_MEMBER_OF_THE_REPUBLICAN_PARTY” and “IS_FORMER_PRESIDENT_OF_U.S.” apply to the same individual in the world (technically it does not know what a property is but I’ll save that subject for another post). That may not ever really be knowable or truly important for usability. What is really important is that it also does not know that these properties are different, which partially explains why it can’t determine the different roles of an individual. Knowing, for example, that the latter two properties are relevant to a POLITICIAN would be useful, wouldn’t it? Would it be even better to be able to see results separating George Bush “the Politician” from George Bush “the Comedian”?

Well the good news is that this is actually possible with some of the new techniques being used in today’s information retrieval products. And it is being put into practice. There are, however, some daunting challenges inherent in such efforts. And they don’t necessarily lie in the details of implementing latent semantic indexing. In fact, you can find them easily by playing 20 questions. Go ahead. Try playing 20 questions when the object you are thinking of is a person instead of an object or place. My guess is that those who win this game are very good at narrowing down and identifying salient properties of people. I encourage all readers not only to try this but once you have, read about the Frame Problem here in the Stanford Encyclopedia of Philosophy: http://plato.stanford.edu/entries/frame-problem/

The reason that 20 Questions is a good test is that it was originally conceived to find the information necessary to identify and arbitrary object. The game suggests that this information approximates a limit of 20 bits – under the assumption that each question allows the questioner to eliminate half the objects in his information universe – allowing the questioner to distinguish between 2 or 1,048,576 subjects. The best strategy for 20 questions of course it to ask the type of questions that do in fact split the information field in half. Not so simple. And even more complex when trying to split the data relevant to people this way.

What I would like to explore – and have you help me explore – in my next post is why some searches are so much more difficult than others. It may not be just about the sheer number of properties associated with any given entity…..it may be about finding and articulating the properties. Some properties just seem to be easier to “pin down” than others.
More next time……..

Advertisements