You are currently browsing the tag archive for the ‘linguistics’ tag.

It’s been a while since my last post but I am looking forward to the upcoming Sentiment Analysis Symposium in May. One of the new things to think about with sentiment is how it can be expressed in different genres. Until recently linguists did not pay terribly much attention to genre in terms of how it might influence lexical and grammatical features for classification and other tasks. That is certainly starting to change but in particular I think the impact on sentiment analysis deserves deeper investigation going forward.

For one thing the microblog document (OK Twitter) has so many more ways to express sentiment than other document types. It is not just emoticons that are of interest but all kinds of textual manifestations of emotions including the representation of sound (ugh! Eeew!) that are fairly rare elsewhere even in email. I am also excited by the idea of how the concept of “sentiment” has become intertwined with “reputation”. Why is that exciting? Well because the traditional polarity expectations change when something as subjective as a “reputation” becomes the topic. For example, when sentiment analysis was applied mostly to product reviews or news snippets, what sorts of things happened? Well, your “bad” news events like earthquakes and people complaining about products were tagged negatively while product raves and good news were tagged as positive. Sure, once in a while the sarcastic review will stump the classifier as will things like “plummeting” inflation. But reputation is a different animal. Many people do not want certain things exposed simply because of their position relative to other things. For example, a Republican does not want certain types of quotes exposed – even if they are genuine and popular with the public – simply because they hurt his reputation *as a republican* in the media. Reputations may indeed have polarity – it just is not as invariant as the polarity inherent in events in other contexts.
I admit I don’t quite have my head around this yet but I am thinking it over ahead of the symposium and wonder if others are thinking about this too. I originally did not imagine that online communications would vary so much in terms of the way content is structured but I have been surprised….


Well I came away from the Sentiment Analysis Symposium very excited about all the applications of sentiment that were presented. One of the best talks discussed stock price fluctuations as a function of document-level sentiment viewed over time (from media sources). This is of course a very powerful application – depending upon its reliability. I had no doubt even before hearing this talk that sentiment analysis consumers were going to have a large appetite for this kind of application. One thing that struck me in particular though was how the “semantic scope” of sentiment might be expanded – what could anyone add to the financial analysis of unstructured data in this area that could be interesting. Or is it all and only what “people” (I include pundits in this designation) “think” of a company?

For example, while it might seem superficially that only “positive” and “negative” have any application to sentiment analysis in finance, this is, I believe, not the case. For example, if we expand to other types of oppositions we see some interesting types of document analyses that could be very useful. Let’s take a simple event opposition like “buy” vs. “sell” – or its dispositional relative “long” vs. “short”. What if we applied that to topics on a document level or entities on a sentence level? Would it be nice to see the groups that are long on gold and short on t-bills – or turn that into grouping “contrarian” vs. “conventional” positions? OK so maybe things like this are already covered in structured data and available on Bloomberg but what about more subtle oppositions like “expanding”/’growing” or “retracting”/”shrinking”. These oppositions are much more likely to be discussed in quotes from corporate leaders and reported in the media – often echoed in the text of “forward looking statements” of annual reports. Now this sort of thing gets a little more difficult for the quants to pick up without language analysis, and, I would say, a nice addition to the greater SA offering. Now I am sure I will have colleagues laughing at my giddy insistence on putting the cart before the horse – after all, traditional sentiment analysis accuracy testing is still controversial – but, hey, it’s my blog and I’ll dream if I want to.