UK Web Focus

Innovation and best practices for the Web

A Few Days Left to Download a Structured Archive of Tweets

Posted by Brian Kelly on 17 March 2011

On 21 February 2011 John O’Brien, developer of the Twapper Keeper twitter archiving service announced the “Removal of Export and Download / API Capabilities“. In a subsequent video interview John explained the reasons for the removal of this service, which arose following Twitter announcement that it was enforcing its policy that third party services are not allowed to syndicate or redistribute tweets. Following Twitter’s ‘cease and desist’ email the removal of Twapper Keeper’s export capabilities and APIs will take place on 20 March – a few day’s time.

It is clear that the popularity of the Twapper Keeper service (which has a total of 2,410,061,623 tweets across 21,475 archives) has demonstrated a clear need for Twitter archiving – and it seems that Twitter wishes to be able to commercially exploit such popularity. I would guess that other services, such as Martin Hawksey’s iTitle Twitter captioning service is another example of an innovative approach which Twitter will be seeking to exploit commercially.

Last year’s JISC-funded developments to the Twapper Keeper service included making the software available under a Creative Commons licence. If you visit the Your.TwapperKeeper.com site you will be able to download the software which can be run on your own server. Clearly you would not be able to simply replicate a public Twapper Keeper service, but if Twitter’s terms and conditions are aimed at stopping public redistribution of tweets it would appear possible to install the software on an institutional Intranet – although I should admit that IANAL.

It should the pointed out that the Twapper Keeper service will continue to archive tweets which can be accessed via the HTML interface – what is being lost is API access and the ability to download a structured archive of tweets in for example, MS Excel format with columns of the tweets, Twitter userid, date and time information, geo-location information, etc. Such structured information is, as Twitter is very aware of, valuable for developers who wish to carry out richer data analysis or provide additional value-added services on top of the conventional Web-based display of tweets.

It is still possible for a few days to download such structured archives from Twitter. I have recently looked at the details of my TwapperKeeper archives. I have decided to keep a local archive of tweets associated with a number of talks I have given. However I don’t intend to keep a structured archive which are primarily of interest to event organisers (such as the ALT-C, JISC and CETIS conferences). I have also decided to keep a record in the list below of the decisions I have made. Note that an example of a local archive can be seen for the seminar I gave last year at the University of Girona.

Archive Type Name Description Policy # of Tweets Create Date
#Hashtag #a11y Accessibility (a11y) Archive not kept as this subject based archive is not directly related to my key areas of work. 42427 04-25-10
#Hashtag #accbc CETIS/BSI Accessibility SIG meeting. Local archive not kept as I was a speaker at this recent event. 154 02-28-11
#Hashtag #altc2009 The ALTC 2009 conference Archive not kept as this event-based archive will primarily be relevant to the event organisers. 4737 08-28-09
#Hashtag #altmetrics New approaches for developing metrics for scholarly research Archive not kept as this subject-based archive will primarily be relevant to others with an interest in the subject area.. 158 01-15-11
#Hashtag #Ariadne The Ariadne hashtag – which may be used for UKOLN’s Ariadne ejournal. Archive not kept as this subject-based archive will primarily be about topics other than UKOLN’s Ariadne ejournal. 11897 09-21-10
Keyword Ariadne Archive of tweets contains the string ‘Ariadne’ Archive not kept as this subject-based archive will primarily be about topics other than UKOLN’s Ariadne ejournal. 25598 09-21-10
@Person ariadne_ukoln Tweets about the Ariadne web magazine. Local archive kept. 882 05-28-10
@Person briankelly Tweets about Brian Kelly Personal archive kept. 6471 03-19-10
#Hashtag #CETIS The CETIS service, based at the University of Bolton. Archive not kept as this organisational archive will primarily be of relevance to the host institution. 2836 09-24-10
#Hashtag #CILIP CILIP, the Chartered Institute of Library and Information Professionals. Archive not kept as this organisational archive will primarily be of relevance to the host institution. 4494 09-24-10
#Hashtag #CILIP1 Campaign on future of CILIP organisation based on CILIP’s 1-minute messages. Archive not kept as this campaign-based archive will primarily be of relevance to the host institution. 357 06-13-10
#Hashtag #CSR Comprehensive Spending Review Archive not kept as this subject archive will primarily be of relevance to others. 79799 10-15-10
#Hashtag #falt09 ALTC Fringe Archive not kept as this event-based archive will primarily be of relevance to others. 219 08-28-09
#Hashtag #heweb10 Tag for the HigherEdWeb 2010 conference Archive not kept as this event-based archive will primarily be of relevance to others. 8723 09-28-10
#Hashtag #ipres10 Tweets for the iPres10 conference, Vienna, 19-24 Sept 2010. Archive not kept as this event-based archive will primarily be of relevance to others. 2 08-27-10
#Hashtag #ipres2010 Archive for the IPres 2010 conference to be held in Vienna on 19-25 Sept 2010. Archive not kept as this event-based archive will primarily be of relevance to others. 1397 08-27-10
@Person iwmwlive IMWM live blogging account Local archive kept. 1373 04-30-10
#Hashtag #jisc10 JISC 2010 conference Archive not kept as this event-based archive will primarily be of relevance to others. 2059 04-02-10
#Hashtag #jiscpowr Archive of tweets related to the JISC PoWR project provided by UKOLN and ULCC Archive not kept due to low numbers of tweets. 6 07-09-10
#Hashtag #jiscpowrguide Archive of tweets about the Guide to Web Preservation published by the JISC-funded PoWR project and launched on 12 July 2010. Archive not kept due to low numbers of tweets. 2 07-09-10
#Hashtag #ldow2010 Linked Data on the Web 2010 conference Archive not kept as this event-based archive will primarily be of relevance to others. 524 04-25-10
#Hashtag #loveHE Times Higher Education campaign to support Higher Education in UK. Archive not kept as this campaign-based archive will primarily be of relevance to others. 12066 06-12-10
#Hashtag #mdforum UKOLN’s Metadata Forum Local archive planned. 119 12-10-10
#Hashtag #morris Tweets about Morris dancing Archive not kept as this social archive will primarily be of relevance to others. 17813 10-16-10
#Hashtag #oxsmc09 socialmediaconference Archive not kept as this event-based archive will primarily be of relevance to others. 1063 09-18-09
#Hashtag #PhD Tweets for researchers using the #PhD tag Archive not kept as this subject-based archive will primarily be of relevance to others. 28527 09-24-10
#Hashtag #s113 Workshop session at ALTC 2009. Local archive kept (will be edited to remove irrelevant tweets posted after event had taken place). 227 09-03-09
#Hashtag #scl2010 Scholarly Communication Landscape (SCL): Opportunities and challenges symposium, held at Manchester Conference Centre on 30 November 2010. Archive not kept as this event-based archive will primarily be of relevance to others. 39 12-02-10
#Hashtag #ucassm Social Media Marketing Conference organsied by UCAS. Archive not kept as this event-based archive will primarily be of relevance to others. 223 10-18-10
#Hashtag #udgamp10 What Can We Learn From Amplifed Events seminar, given by Brian Kelly, UKOLN at the University of Girona.
Local archive available
Local archive kept. 395 09-01-10
#Hashtag #ukmw09 UKMuseumsandtheWeb Archive not kept as this event-based archive will primarily be of relevance to others. 750 12-05-09
Keyword ukoln Tweets about UKOLN Local archive kept. 1948 03-19-10
#Hashtag #ukolneim UKOLN’s Evidence, Impact, Metric work Archive not kept due to low numbers of tweets. 45 11-05-10
#Hashtag #w3ctrack W3C Track at WWW 2010 confernce Archive not kept as this event-based archive will primarily be of relevance to others. 179 04-30-10
#Hashtag #ww2010 Misspelling of WWW2010 hashtag Archive not kept as this event-based archive will primarily be of relevance to others. 833 04-29-10

It should be noted that this list is based on Twapper Keeper archives which I created. There will be a number of other archives which will be of interest to myself and colleagues at UKOLN which may also be archived locally.

Also note that a number of event-based Twitter archives (such as the #s113 archive of a workshop session at the ALT-C 2009 conference) will contain irrelevant tweets due to the hashtag being used for other purposes. Such irrelevant tweets may be deleted from the archive

About these ads

2 Responses to “A Few Days Left to Download a Structured Archive of Tweets”

  1. [...] This will be good news for those who may have were not able to take action following last week’s post that there were only “A Few Days Left to Download a Structured Archive of Tweets“. [...]

  2. [...] This will be good news for those who were not able to take action following last week’s post that there were only “A Few Days Left to Download a Structured Archive of Tweets“. [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: