OMG! I Didn’t Intend Everyone To Read That!
Posted by Brian Kelly on 29 September 2010
A Context To Archiving of Digital Content
We’ve probably all had the experience of creating digital content and, in retrospect, wishing we hadn’t said what we’d said, had rephrased our words or could delete all copies of the embarrassing content from hard drives around the world – and, if it were only possible, from people’s brain cells too! I still cringe at the memories of the time I sent a message to a former colleague of mine complaining about a third party – and getting a phone call 2 minutes later asking if I was aware that the messages had been cced to the third party. Since then even if I don’t always spell check my messages I do try and check the distribution list before pressing the Send key.
User Management of Archiving of Tweets
Although these issues are nothing new: they include messaging systems such as Usenet New, instant messaging and email as well as publishing systems such as the Web. In all of these environment digital content can easily be copied, forwarded to others and archived. But these concerns are being highlighted once again in the context of Twitter. Although the creator of a tweet can delete the tweet, once it has left the Twitter environment it can be difficult to retain management of the content.
It is possible to delete tweets, but once they have left the Twitter environment it becomes difficult to manage them The announcement in April 2010 that the Library of Congress will be archiving tweets caused the concerns over ownership of tweets to be revisited. According to the Law and Disorder blog:
After “long discussions with Twitter over this,” Anderson and other LoC officials agreed to take on the data with a few conditions: it would not be released as a single public file or exposed through a search engine, but offered as a set only to approved researchers.
It is not obvious what an “approved researcher” is but it seems clear that this service won’t be able to be used for general use, such as embedding hashtagged event tweets on a video (as the iTitle tool does) or for providing statistics on usage of particular hashtags (as Summarizr does).
Whilst following the #ipres2010 tweets from the iPres 2010 conference, where my colleague Marieke Guy presented our joint paper on “Twitter Archiving Using Twapper Keeper: Technical And Policy Challenges“, I became aware of the #NoLoC service which will prevent tweets from being archived by the Library of Congress. If you register with this service using your Twitter account any of your tweets which contain the #noloc, #noindex or #n hashtag will be automatically deleted from Twitter after a period of 23 weeks – one week before they are archived by the Library of Congress.
This isn’t an approach which will help with those embarrassing tweets which have been posted – if you are alert enough to add the tag you will probably be thinking about what you are saying. It is also interesting to observe that the service appears to have been set up to prevent the government (should the Library of Congress be regarded as the US Government?) from keeping an archive of tweets: “Every single Twitter tweet will be archived forever by the US government” – it says nothing about Google having access to such tweets.
In addition I think it’s likely that users who use a #noloc tag on their tweets will draw attention to themselves and their attempts to stop the government from archiving their tweets – I wonder if the government is already archiving #noloc tweets to say nothing of the tabloid newspaper which will have an interest in publishing embarrassing tweets from celebrities. It will be interesting to see if any politicians or civil servants, for example, use this approach in order to protect politically embarrassing comments which the public should have a right to know about.
What Is To Be Done?
This discussion does make me wonder if there is a need to engage in discussions with Twitter over ways in which privacy concerns can be addressed. Would it, for example, be possible to develop a no-index protocol along the lines of the robots exclusion protocol developed in 1993 which provided a mechanism for Web site administrators to specify areas of their Web sites which conformant search engine crawlers should not index. Might Twitter developments, such as Twitter annotations, provide an opportunity to develop a technical solution to address the privacy concerns?
Of course once an archive of tweets is exported to, say, an Excel spreadsheet, there will be nothing which can be done to restrict its usage. So just like use of Usenet News, chat rooms and mailing lists perhaps the simplest advice is to “think before you tweet” – or, as the Romans may have put it, “Caveat twitteror“.