Now collecting…

For the last few months we have been collecting discussion of public policy on social media using DiscoverText. We are trying to understand how public policy is discussed online. To date we have collected just under half a million Facebook posts and YouTube comments on 26 different policies and issues.

Here is a list of the kind of things we have in the archive.

* GayMarriage (2462 Tweets, hashtag)
* Community budgets – (1800 tweets)
* 10p Tax rate (43924 de-duplicated tweets)
* Bedroom Tax (8155 tweets)
* Big society (2247 tweets, hashtag)
* Bristolmayor (8847 de-duplicated tweets, hashtag)
* Nick Clegg Sorry video (1030 YouTube comments)
* Councillors for Hire (4092 de-duplicated tweets)
* Lansley Rap YouTube song – (1037 YouTube comments)
* EUSpeech, David Cameron (40,000 tweets and 54,000 Facebook page posts)

* FakeXmas Twitter campaign (57 tweets)
* 2012 Floods (1498 Tweets)
* Hospital food (1974 de-duplicated tweets)
* HS2 (13685 tweets, 100 DisQus posts).
* Local elections (16926 Tweets ongoing)
* Mansion Tax (6025 tweets, and 188 Google plus)
* Minimum unit price England (12,000 Tweets).
* Neighbourhood planning (262 Tweets ongoing)
* PCC (45,000 in build up and 50,000 post election, 1211 Facebook page posts)

* Birmingham Cuts announcement (1528 tweets)

* Rotherham, UKIP foster parents decision (6970 tweets)
* Thatcherism, post-death (23700 tweets)
* Thatcher Funeral, morning of 17th – (100,000 tweets).
* Transforming Rehabilitation, probation service reform (350 Facebook Posts, 2900 tweets).

* Troubled families (4000 tweets).
* Work Programme (10343 tweets).

The work to understand the shape of the debate starts by de-duplicating exact and near duplicates, then we check the tweets are on-topic, and not just opportunist hashtag spam. We then identify those that express opinion about the topic and divide them up by theme. We draw on a dispersed team of real life human coders who code portions of the datasets. We check for inter-coder agreement and validity. We use the human coding to train custom machine classifiers to classify large portions of the datasets, reducing the need for human coding. One further way of getting a sense of the emerging shape of the discussion is to ask a group of people to Q sort a diverse sample of items using The analysis identifies shared viewpoints and informs further rounds of coding.

More to follow…

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s