According to Summarizr there have been 6,927 tweets for the #altc2010 event hashtag, which compares with 4,735 tweets for the #altc2009 event. We can therefore conclude that there has been an increase of almost 50% in Twitter usage. Or can we? If we had carried out the analysis immediately after the event the numbers would probably have been different. And use of either of these hashtags now, when talking about a past event, will have a different context to using the hashtag during the event, when such tags provided some level of engagement with the Twitter community centred around the event’s Twitter stream.
In order to make meaningful comparisons there is a need to be able to filter the tweets in a consistent fashion. Fortunately the Twapper Keeper service allows tweets to be filtered by various parameters, including a date range. And since the Summarizr service uses Twapper Keeper to provide its statistics it is possible to use Summarizr’s metrics in a consistent fashion.
But what date range should be used? An initial suggestion might be for the day(s) of the event. But this would fail to include discussions which take place immediately before and after an event. In addition this could also mean that tweets from an international audience not being included, such as tweets from an Australian audience which take place the following day. Such confusions over dates might apply particularly to events held in other countries since the times used in Twitter are based on GMT.
In order to avoid such confusions when I cite statistics from Summarizr I now include tweets posted during the week of an event, typically starting on the Sunday and finishing on the following Saturday. For an event lasting for a day I start on the day before the event and finish on the following day.
The syntax for obtaining statistics from Twapper Keeper over a date range is:
sm is the start month (from 1 to 12)
sd is the start day (from 1 to 31)
sy is the start year (e.g. 2010)
em is the end month (from 1 to 12)
ed is the end day (from 1 to 31)
ey is the end year (e.g. 2010)
For example the following URL will give statistics for the #altc2009 hashtag between 6-11 September 2009:
and the following statistics for the #altc2010 hashtag between 5-12 September 2010:
This provides the following statistics:
|ALT-C 2009||ALT-C 2010|
|Nos. of tweets||4,010||6,238|
|Nos. of twitterers||650||666|
|Nos. of hashtags tweeted||125||277|
|Nos. of URLs tweeted||554||683|
|Nos. of geo-located tweets||0||35|
This indicates that there has been of 56% in twitter usage between comparable periods in 2009 and 2010.
Note that the statistics for the numbers of geo-located tweets demonstrate that in 2009 nobody was providing geo-located tweets for the event hashtag. This data could easily be lost if Twitter users today started to refer to the 2009 event and had started to make use of geo-location.
To sum up my proposal:
- The start date for a one-day event is the previous day and the end date is the following day. This will address internationalisation issues due to engagement for those in other time zones and cover discussions just before and just after the event.
- The start date for an event lasting longer than a single day is the previous Sunday and the end date is the following Saturday. This will address internationalisation issues due to engagement for those in other time zones and cover discussions just before and just after the event.
Is this a convention we can agree on, to ensure that meaningful comparisons can be made?
Twitter conversation from Topsy: [View]