UK Web Focus (Brian Kelly)

Innovation and best practices for the Web

  • Email Subscription (Feedburner)

  • Twitter

    Posts on this blog cover ideas often discussed on Twitter. Feel free to follow @briankelly.

    Brian Kelly on Twitter Counter

  • Syndicate This Page

    RSS Feed for this page


    Creative Commons License
    This work is licensed under a Creative Commons Attribution 2.0 UK: England & Wales License. As described in a blog post this licence applies to textual content published by the author and (unless stated otherwise) guest bloggers. Also note that on 24 October 2011 the licence was changed from CC-BY-SA to CC-BY. Comments posted on this blog will also be deemed to have been published with this licence. Please note though, that images and other resources embedded in the blog may not be covered by this licence.

    Contact Details

    Brian's email address is You can also follow him on Twitter using the ID briankelly. Also note that the @ukwebfocus Twitter ID provides automated alerts of new blog posts.

  • Contact Details

    My LinkedIn profile provides details of my professional activities.

    View Brian Kelly's profile on LinkedIn

    Also see my profile.

  • Top Posts & Pages

  • Privacy


    This blog is hosted by which uses Google Analytics (which makes use of 'cookie' technologies) to provide the blog owner with information on usage of this blog.

    Other Privacy Issues

    If you wish to make a comment on this blog you must provide an email address. This is required in order to minimise comment spamming. The email address will not be made public.

The Components of Twitter to be Archived

Posted by Brian Kelly on 30 Apr 2010

In a recent post on “Developments to Twapper Keeper” I described JISC-funded developments to the Twapper Keeper Twitter archiving service. I mentioned how the Twapper Keeper blog was being used initial to gather user requirements for developments n User Enhancements to Twapper Keeper and API Developments to Twapper Keeper. I’m pleased that we have received a number of suggestions – one of which, a request to allow tweets to be deleted from the archive and users to opt-out of Twapper Keeper archiving, has been identified as an important feature, particularly for UK users in light of the uncertainties regarding Twitter and copyright in light of the recent passing of the Digital Economy Act.

It has recently occurred to me, though, that we haven’t properly defined what it is that will be archived to allow subsequent reuse (e.g. by tools such as Martin Hawksey’s Twitter capturing service) or analysis (e.g. the sentiment analysis which failed to identify the irony of the tweets posted with the #NickCleggsFault tag).

We will be able to archive the contents of a tweet contained within the 40 character limit which will include the textual content, hypertext links to Web resources and Twitter pictures and videos, the Twitter ID of the recipient of public messages (or the subject of a message) as defined by the @ command and the hashtag(s) used in a tweet. Are there any other structural elements of a tweet, I wonder?

As well as the content of a tweet which is created by the author, there will be a number of metadata attributes which will also be available. This will include the Twitter ID of the poster, the data and time and name of the Twitter client used and, optionally, geo-location information (which I suspect will grow in importance). Again I wonder if there are additional metadata fields I may have missed.

In addition to this Twitter information there is also information related to the Twitter user’s community – the numbers of people they follow and who follow them. The ability to gather this (volatile) information could be useful for observing trends, identifying causes of viral Twitter posts, applying heuristics for spotting Twitter spammers (as Tony Hirst has described), etc.

The systematic archiving of information related to a Twitterer’s community is probably out-of-scope for the current Twapper Keeper development work. But will, I wonder, such information be harvested as part of the Library of Congress’s Twitter archiving work?

One Response to “The Components of Twitter to be Archived”

  1. The components of a Tweet are neatly illustrated here:

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: