Dr. Stephen Jeffares
University of Birmingham
The other day I spotted a blog post by Joe Senior from the customer insight company – Clarabridge. Entitled “Work the Stock Market with Sentiment and Text Analysis – We’ll Show You How at GAIM USA 2013”, Joe reported how he was about to talk to hedge fund managers about the potential of “using technology to analyze Twitter data to figure out what is going on in the market – and offering trade-related insight into whether you should buy or sell” The author continued: “we’ll be analyzing Tweets about some major organizations to try to ferret out sentiment and correlate it with stock prices and issues, all in real time. This is really game-changing stuff for traders and hedge funders”.
The reason I bring this up is for two reasons. The first is it captures the excitement around analysing social media sentiment and the insight and even predictive potential it brings. And second because I think it sums up where much of the innovation is in social media and sentiment analysis is at the moment. It is driven by the potential of making money. The innovation, it seems, is following the money.
This was highlighted in a recent National Centre for Research methods research funding call. The call suggested the innovation and growth of Social Media Analysis was “driven by the demands of the commercial sector” and that academic capacity was “some way behind” (NCRM 2012: p.6)
As a social scientist working in a University and with a background in policy analysis, I am excited by the fast moving developments in the text analytic, social data analysis, sentiment analysis world but I don’t share the profit motivation. If all we use this stuff for is to be one step ahead of the market, then we are missing a trick. But with so much at stake, much of the innovation will take place behind commercial smokescreens and intellectual firewalls.
In search of what else is going on, I did a rapid review of peer reviewed journal articles that were gaining insight from Twitter. I was relieved to find alongside the burgeoning commercial literature, there are three other kinds of literatures emerging – the democratic focusing on social movements and the potential of Twitter in social change. The political – those seeking to predict electoral patterns, turnout, spurred on by the recent social media frenzy of Obama verses Romney. And then there are practice literatures, around how this changing public service engagement with publics, journalism, policing and research itself, particularly for mapping and the possibilities of big data. So not all social scientists are dedicated to the study of Twitter for the purposes of profit and commercial gain.
Although there is the acknowledgement of sentiment, meaning, subjectivity of Tweets – the data is increasingly massified. There is a drive to speed up the analysis and show the weight of opinion as it shifts in almost real time. David Cameron – up 3%, the big society down 2%. But as political scientist Stu Shulman said on a screen cast recently, with hollow sentiment meters “It is very hard to know what one ambiguous tweet means, much less what they all add up to” “. He takes issue with what he calls hollow sentiment meters and the rise of the misleading info graphic. His antidote is a tool is called DiscoverText. It allows the user to import Tweets by word mentioned or hashtag. You can then visualise the set as a word cloud, remove duplicates, cluster near duplicate tweets. Unlike a lot of tools designed to map social networks the emphasis is on interpreting meaning. You can code the tweets either alone or allocate batches to peers and colleagues to code in the cloud. You can use this coding to train a classifier to automatically code up batches of 1000, 10,000 or I suppose a million tweets. The point is that the classifier is contextualised and able to detect the nuance and idiosyncratic nature of a tweet.
I’m using it to look at how policy ideas, ideas like total place, flourishing neighbourhoods, big society, live and die in the world of social media. Most government departments and local councils have Twitter feeds and policy ideas on Twitter are hashtags. During the recent Police and Crime Commissioner elections I collected 100,000 tweets that referenced #PCC and #MyPCC. I am currently designing a classifier in DiscoverText that can distinguish between tweets that express sentiment towards the policy and those that report facts, promote external weblinks or are just plain spam. From this I then use Q methodology to map the inter-subjective viewpoints that emerge around the policy idea. My aim is to show the emergence of subjectivity around policy ideas from the point of launch and how these micro-concourses evolve daily.
There’s more work to do, but what I hope is this is an example of how you can use large Twitter datasets for other things than deriving quantities. It is the subjectivity and shape of ideas that matters. So big data sets of social data, the motive needs to be more than predicting stock market performance. We can do so much more.