Dr. Stephen Jeffares, University of Birmingham.
This blog post describes an on-going research project sponsored by the British Academy called “The Shape of Ideas to Come”.
This project studies Tweets that express opinion about policy ideas. By Tweets I mean those 140 character messages that people send over Twitter. By policy idea I mean anything from ‘climate change’ to ‘big society’. In most cases they are deliberate policies invented by governments, policy makers or organisations. The interesting thing about policy ideas is that they tend to end up being discredited, usually within 3 years. The focus of this research is how users of Twitter express opinion about policy ideas.
The first job is to capture discussion around a particular idea. Let me give you an example. In November 2012 the Home Office were responsible for the election of 41 Police and Crime Commissioners. The voter turn-out in the election was pretty poor, but nevertheless the election went ahead and there are now serving PCCs in every police area of England and Wales. The bit that is of interest to this project is how the Home Office developed a hashtag of #MyPcc to focus Twitter discussions of the election. Within a few hours users on Twitter were using the hashtag to criticise the election and the rationale for the policy of having elected commissioners. For the purposes of this project we collected 100,000 Tweets that included either #MyPCC or #PCC. We started the collection three weeks before the election. As you would imagine most of the Tweets and discussion came in the final few days before the election, with almost half coming on the day after the election, during the time the results were being announced and the issue was prominent in the news cycle.
When we sat down to examine the 100,000 Tweets the first thing we noticed is that many of them expressed opinion about the policy idea, but, importantly, not all. Many of the Tweets we found to be conversations between Twitter users or, alternatively, factual where candidates and blogger s are publicising meetings or directing users to look at their websites. But alongside all of this conversation and broadcasting were relatively clear expressions of opinion. The kind of opinionated Tweets included phrases like: “I think this policy is a waste of time”; “In my opinion this is privatisation by the back door”; “I imagine this will end up costing more than the previous approach”; “It is clear that nobody has a firm grip of what needs to done”; “I think this is an important step forward and we need to embrace it”.
Although these opinionated Tweets vary, initial categorisation reveals there to be overlapping themes, repeated phrases and use of metaphor and cliché. Although much can be learnt from isolating the opinionated tweets from the others, how to go about separating them out for analysis is a major challenge facing this project. Thankfully there are some software tools available that can automate much of the process, but because we are dealing with subjectivity it also needs human intervention. It requires analysts.
How it works is this. The analyst signs in to a secure website. They are given a coding scheme – usually something simple like “1. Opinion” “2. Not” and a batch of Tweets. Once underway, the first Tweet flashes up full screen. Hit “1” for Opinion, and “2” for Not. Once coded the remaining Tweets flash up one by one until all items are coded or the analyst presses the Stop button. Because everybody signs in from their own device, several analysts can be working on the same set of Tweets at any one time. Not everybody will agree on the categorisations, but through discussion of this disagreement that we can clarify our working definitions. Armed with clearer definitions we can move to code new batches of Tweets with greater accuracy. Throughout the process the software is learning about the nuanced distinctions between opinionated and non-opinionated tweets. Following further rounds of coding and review the process of classification can then be handed over to the machine. This automation opens up the potential to classify thousands of Tweets in a matter of seconds.
Once the software is trained, the role of the analysts becomes one of devising a coding scheme to categorise the opinionated Tweets. This is an iterative process but the aim is to identify key themes and overlaps and remove duplicates. The aim is to represent the range and diversity of debate.
If you are interested in getting involved in the role of coding and classifying tweets about policy ideas please contact the Principal Investigator Dr Stephen Jeffares, University of Birmingham.