You are currently browsing the category archive for the ‘language technology’ category.

Since I am speaking at the Sentiment Analysis Symposium next week, I have had sentiment analysis on my mind, as you might imagine. What I find interesting is that, like so many other areas of natural language processing technology, it tends to have its own little niche of practitioners who are completely shut off from the other communities under the NLP umbrella. Very few who associate themselves with it have given much thought about the interaction of SA with information retrieval, machine translation or even document classification of which it is a proper part. The latter is especially surprising to me, since considering the semantic nature of the three (aha – or should there be two??) “traditional” sentiment classes – positive, negative and neutral –  raises  some important issues in the general semantics of  “opposition”. Lets start with the paradox of having more than two sentiment classes. The effect of that idea is to move sentiment analysis out of the semantic “bucket” of polarity altogether. Is that something that makes sense for usability and information quality? Does it open the door to making sentiment gradable in general? What would that mean? One thing it would mean is that any hope of alignment with human judgments – already shaky –  would be gone. It would also negatively impact usability by virtue of the weak semantic substance in the (theoretically infinite) number of sentiment classes. Each vendor of the technology could have a different proprietary scale making product comparisons impossible as an added distraction.

OK well maybe I’m getting a bit extreme……and we haven’t seen sentiment scales above three coming out in products.  But on the other hand is dividing the world of thought up by applying binary sentiment over popular opinion a reasonable classification goal as an alternative? Or is that too limiting? I actually believe that opening sentiment analysis up to explore the greater world of semantic “opposites” is the way to push the technology into a future of greater usability and profit. I suppose we’ll see what people think when I float that idea at my talk……..

My last post left off by asking readers to play 20 questions using people as the intended objects and then, reflecting on how that unfolded, read about the Frame Problem – a much discussed and debated issue in both computer science and contemporary philosophy.

Before I get into what I believe to be the applications of the Frame Problem to today’s search technology paradigm, I will go back to the thread of “properties” to which I promised you I would return.

Remember the “properties” of George Bush that we discussed – properties such as “IS_FUNNY”, and “IS_FORMER_PRESIDENT_OF_U.S.” – were things that the search engine did not understand and could not use to help the user find more “useful” results despite finding results that were, technically, “relevant” to “George Bush”.
To show the importance of properties in general information retrieval (and now I am going far beyond just search technology), try playing 20 questions again as if you were a typical search engine. Someone would start the game with a person in mind. You would be tempted to say something like “Is this person in the news?” or “Is this person female?”. But things like “HAS_GENDER” and “IS_FAMOUS” are properties, aren’t they? So you can’t do that. If you were a search engine, all you could do is blindly throw out contexts where you had encountered a “person” in the past – definitions, lists of synonyms etc. You could only distinguish on the basis of frequency (or more precisely features) of occurrence. Now, you are never going to get anywhere in 20 Questions this way, are you?? And this is why search engines that can’t distinguish properties don’t get you useful results even though what they produce may be relevant or “popular”.

All of this is to tie in with the notion of the Frame Problem. This problem, as I mentioned before, is a long-discussed and disputed problem related to artificial intelligence and philosophy. But really it is very relevant not just to search technology, but to the very activity of search in general – the idea of task completion, really. So, your “task” in 20 questions is to guess the identity of a person, place or thing within a certain number of tries, and to complete this task as efficiently as possible (and “win” the “game”) you must have a strategy. The importance of a “strategy” in completing any task – from supplying search engine users with good results to winning 20 questions – cannot be overlooked. In fact, if you read Daniel C. Dennet’s seminal work on the Frame Problem (See Dennett, D.C. 1984. Cognitive wheels: The frame problem in artificial intelligence. In Hookway, Minds, Machines and Evolution, 129—151), you will quickly learn how much knowledge is required just to make a turkey sandwich! The frame problem is really about “framing” the knowledge required for task completion so that it does not involve either too much or too little data. For example, there are all kinds of data points that a human being processes when making a turkey sandwich but only a subset of them are relevant to the completion of the task – so for example you maintain the knowledge that refrigerators keep things cold but you don’t really need to draw on that knowledge to make your sandwich, do you? So effective task completion involves not just knowing how to do something but using the right knowledge at the right time.
I will leave off with a Google search for Toyota – which has at least three possible referents – an organization, a product manufactured by an organization and a place. Google is able to separate genre pretty well – that is, it has news separated from Wiki pages, separated from Twitter feeds. So while genre recognition is indeed getting closer to notions of “utility” and salient contextual knowledge in our search technology it falls short of truly recognizing properties of entities.

More next time….until then check out Dennett 1984 and this time think of how to program a robot to be good at 20 Questions!