A Few Days Left to Download a Structured Archive of Tweets
Posted by Brian Kelly on 17 March 2011
On 21 February 2011 John O’Brien, developer of the Twapper Keeper twitter archiving service announced the “Removal of Export and Download / API Capabilities“. In a subsequent video interview John explained the reasons for the removal of this service, which arose following Twitter announcement that it was enforcing its policy that third party services are not allowed to syndicate or redistribute tweets. Following Twitter’s ‘cease and desist’ email the removal of Twapper Keeper’s export capabilities and APIs will take place on 20 March – a few day’s time.
It is clear that the popularity of the Twapper Keeper service (which has a total of 2,410,061,623 tweets across 21,475 archives) has demonstrated a clear need for Twitter archiving – and it seems that Twitter wishes to be able to commercially exploit such popularity. I would guess that other services, such as Martin Hawksey’s iTitle Twitter captioning service is another example of an innovative approach which Twitter will be seeking to exploit commercially.
Last year’s JISC-funded developments to the Twapper Keeper service included making the software available under a Creative Commons licence. If you visit the Your.TwapperKeeper.com site you will be able to download the software which can be run on your own server. Clearly you would not be able to simply replicate a public Twapper Keeper service, but if Twitter’s terms and conditions are aimed at stopping public redistribution of tweets it would appear possible to install the software on an institutional Intranet – although I should admit that IANAL.
It should the pointed out that the Twapper Keeper service will continue to archive tweets which can be accessed via the HTML interface – what is being lost is API access and the ability to download a structured archive of tweets in for example, MS Excel format with columns of the tweets, Twitter userid, date and time information, geo-location information, etc. Such structured information is, as Twitter is very aware of, valuable for developers who wish to carry out richer data analysis or provide additional value-added services on top of the conventional Web-based display of tweets.
It is still possible for a few days to download such structured archives from Twitter. I have recently looked at the details of my TwapperKeeper archives. I have decided to keep a local archive of tweets associated with a number of talks I have given. However I don’t intend to keep a structured archive which are primarily of interest to event organisers (such as the ALT-C, JISC and CETIS conferences). I have also decided to keep a record in the list below of the decisions I have made. Note that an example of a local archive can be seen for the seminar I gave last year at the University of Girona.
|Archive Type||Name||Description||Policy||# of Tweets||Create Date|
|#Hashtag||#a11y||Accessibility (a11y)||Archive not kept as this subject based archive is not directly related to my key areas of work.||42427||04-25-10|
|#Hashtag||#accbc||CETIS/BSI Accessibility SIG meeting.||Local archive not kept as I was a speaker at this recent event.||154||02-28-11|
|#Hashtag||#altc2009||The ALTC 2009 conference||Archive not kept as this event-based archive will primarily be relevant to the event organisers.||4737||08-28-09|
|#Hashtag||#altmetrics||New approaches for developing metrics for scholarly research||Archive not kept as this subject-based archive will primarily be relevant to others with an interest in the subject area..||158||01-15-11|
|#Hashtag||#Ariadne||The Ariadne hashtag – which may be used for UKOLN’s Ariadne ejournal.||Archive not kept as this subject-based archive will primarily be about topics other than UKOLN’s Ariadne ejournal.||11897||09-21-10|
|Keyword||Ariadne||Archive of tweets contains the string ‘Ariadne’||Archive not kept as this subject-based archive will primarily be about topics other than UKOLN’s Ariadne ejournal.||25598||09-21-10|
|@Person||ariadne_ukoln||Tweets about the Ariadne web magazine.||Local archive kept.||882||05-28-10|
|@Person||briankelly||Tweets about Brian Kelly||Personal archive kept.||6471||03-19-10|
|#Hashtag||#CETIS||The CETIS service, based at the University of Bolton.||Archive not kept as this organisational archive will primarily be of relevance to the host institution.||2836||09-24-10|
|#Hashtag||#CILIP||CILIP, the Chartered Institute of Library and Information Professionals.||Archive not kept as this organisational archive will primarily be of relevance to the host institution.||4494||09-24-10|
|#Hashtag||#CILIP1||Campaign on future of CILIP organisation based on CILIP’s 1-minute messages.||Archive not kept as this campaign-based archive will primarily be of relevance to the host institution.||357||06-13-10|
|#Hashtag||#CSR||Comprehensive Spending Review||Archive not kept as this subject archive will primarily be of relevance to others.||79799||10-15-10|
|#Hashtag||#falt09||ALTC Fringe||Archive not kept as this event-based archive will primarily be of relevance to others.||219||08-28-09|
|#Hashtag||#heweb10||Tag for the HigherEdWeb 2010 conference||Archive not kept as this event-based archive will primarily be of relevance to others.||8723||09-28-10|
|#Hashtag||#ipres10||Tweets for the iPres10 conference, Vienna, 19-24 Sept 2010.||Archive not kept as this event-based archive will primarily be of relevance to others.||2||08-27-10|
|#Hashtag||#ipres2010||Archive for the IPres 2010 conference to be held in Vienna on 19-25 Sept 2010.||Archive not kept as this event-based archive will primarily be of relevance to others.||1397||08-27-10|
|@Person||iwmwlive||IMWM live blogging account||Local archive kept.||1373||04-30-10|
|#Hashtag||#jisc10||JISC 2010 conference||Archive not kept as this event-based archive will primarily be of relevance to others.||2059||04-02-10|
|#Hashtag||#jiscpowr||Archive of tweets related to the JISC PoWR project provided by UKOLN and ULCC||Archive not kept due to low numbers of tweets.||6||07-09-10|
|#Hashtag||#jiscpowrguide||Archive of tweets about the Guide to Web Preservation published by the JISC-funded PoWR project and launched on 12 July 2010.||Archive not kept due to low numbers of tweets.||2||07-09-10|
|#Hashtag||#ldow2010||Linked Data on the Web 2010 conference||Archive not kept as this event-based archive will primarily be of relevance to others.||524||04-25-10|
|#Hashtag||#loveHE||Times Higher Education campaign to support Higher Education in UK.||Archive not kept as this campaign-based archive will primarily be of relevance to others.||12066||06-12-10|
|#Hashtag||#mdforum||UKOLN’s Metadata Forum||Local archive planned.||119||12-10-10|
|#Hashtag||#morris||Tweets about Morris dancing||Archive not kept as this social archive will primarily be of relevance to others.||17813||10-16-10|
|#Hashtag||#oxsmc09||socialmediaconference||Archive not kept as this event-based archive will primarily be of relevance to others.||1063||09-18-09|
|#Hashtag||#PhD||Tweets for researchers using the #PhD tag||Archive not kept as this subject-based archive will primarily be of relevance to others.||28527||09-24-10|
|#Hashtag||#s113||Workshop session at ALTC 2009.||Local archive kept (will be edited to remove irrelevant tweets posted after event had taken place).||227||09-03-09|
|#Hashtag||#scl2010||Scholarly Communication Landscape (SCL): Opportunities and challenges symposium, held at Manchester Conference Centre on 30 November 2010.||Archive not kept as this event-based archive will primarily be of relevance to others.||39||12-02-10|
|#Hashtag||#ucassm||Social Media Marketing Conference organsied by UCAS.||Archive not kept as this event-based archive will primarily be of relevance to others.||223||10-18-10|
|#Hashtag||#udgamp10||What Can We Learn From Amplifed Events seminar, given by Brian Kelly, UKOLN at the University of Girona.
Local archive available
|Local archive kept.||395||09-01-10|
|#Hashtag||#ukmw09||UKMuseumsandtheWeb||Archive not kept as this event-based archive will primarily be of relevance to others.||750||12-05-09|
|Keyword||ukoln||Tweets about UKOLN||Local archive kept.||1948||03-19-10|
|#Hashtag||#ukolneim||UKOLN’s Evidence, Impact, Metric work||Archive not kept due to low numbers of tweets.||45||11-05-10|
|#Hashtag||#w3ctrack||W3C Track at WWW 2010 confernce||Archive not kept as this event-based archive will primarily be of relevance to others.||179||04-30-10|
|#Hashtag||#ww2010||Misspelling of WWW2010 hashtag||Archive not kept as this event-based archive will primarily be of relevance to others.||833||04-29-10|
It should be noted that this list is based on Twapper Keeper archives which I created. There will be a number of other archives which will be of interest to myself and colleagues at UKOLN which may also be archived locally.
Also note that a number of event-based Twitter archives (such as the #s113 archive of a workshop session at the ALT-C 2009 conference) will contain irrelevant tweets due to the hashtag being used for other purposes. Such irrelevant tweets may be deleted from the archive