Linked Data and the World Cup
A couple of months ago Kingsley Idehen (Founder & CEO of OpenLink Software and an Open Linked Data enthusiast according to his Twitter profile) mentioned on Twitter that he expected to see lots of interesting Linked Data developments taking place around the World Cup. This prediction seems to be coming true judging by the tweet I received from @AdamLeadbetter last night:
@briankelly RT @rlpow: Our #semweb #worldcup crazies are on a roll! @neumarcx @hekanibru @uogbuji
Looking at @neumarcx’s tweets I find a link to a DBpedia entry related to the World Cup:
well so far http://dbpedia.org/resource/France_national_football_team is doing a few things better I’d say #WC2010
Je suis désolé, mais je n’ai pas le choix. But my money is on http://dbpedia.org/resource/Uruguay_national_football_team
I think Kingsley is right – there’s now a great opportunity to see some Linked Data developments in an area which will be of interest to many people around the world. And let’s be honest, the bioinformatics Linked Data examples haven’t really had much public appeal! In addition the public awareness of football and the World Cup also provides an opportunity to raise awareness of some of the complexities in machine-understandable – we know, for example, what Americans mean when they talk about ‘soccer’ but software wouldn’t unless mappings between ‘football’ and ‘soccer’ are provided.
Ownership of the Data
There will also be an opportunity to raise awareness of the issues associated with ownership of data. As I described in a post entitled What’s The Score? And Whose Score Is It, Anyway? according to an entry in Wikipedia the fixture list for UK’s four professional football leagues: the Premier League, The Football League, the Scottish Premier League and the Scottish Football League is owned by Football DataCo. And a year ago @ollieparsley in his The FootyTweets “Cease and Desist” Story described how he “received a Cease and Desist notice from a company that looks after the Premier League and Football Leagues copyright online“. He went on to add that he “checked that the company was legitimate and I am unhappy to say that they are legitimate“. Ollie subsequently also wrote about a MotorTweets Formula 1 Cease and Desist letter. This described how the “Formula One Administration Limited (”FOA”) has the exclusive right to commercially exploit the FIA Formula One World Championship (”the Championship”) including, but not limited to, all moving images, other audio/video content, timing data and results“. So I hope that the Linked Data football-supporting geeks have good lawyers! Or perhaps we should regard this as an opportunity for civil disobedience, claiming that we, the public, have the rights to do what we want with our sporting data – it’s part of our culture and shouldn’t be privatised. Peter Murray-Rust has argued a similar argument related to scientific outputs in a post where he argued that scientists (and librarians) should “Post ALL ACADEMIC OUTPUT publicly – IGNORE COPYRIGHT“.
What Can Linked Data Offer?
What Linked Data applications might appeal to the general public? I have tried DBpedia’s Relationship Finder which depicts relationships between data provided in Wikipedia information boxes. The image shown below shows the relationships between the entries for the England national football team and the German national football team. As can be seen the 1966 World Cup Final is shown as a significant relationship between these two countries :-)
As depicted in the graph, the relationship is actually between the England and West Germany national football teams, although there is a direct relationship between the West Germany and Germany teams. How, I wonder, would this relationship have been depicted is we had beaten USSR – a county which, like West Germany, no longer exists. Seeing such relationships makes one aware of the complexities in interpreting data.
What About Twitter?
We saw with the #uksnow example how Twitter can be used to aggregate lightly-structured data. Might Twitter have a similar role to play during the World Cup. If World Cup tweets are to be analysed there will be a need to identify the relevant hashtags – and I have seen, from my Twitter followers, the #wc2010 and the #worldcup tags both being used. But will there be agreement on a hashtag for the countries competing in #wc2010 (to use by preferred hashtag)? Last night I observed three character country codes being used (#FRA and #URU). Assuming there is an agreed international standard for such country code for national football teams it might be possible to carry out some interesting sentiment analysis – although as we learnt from the #nickcleggsfault st0ry automated analyses can misinterpret irony. We might also need to be aware that disgruntled Scottish fans may be included to tweet for #AnyTeamBarENG! As for me, I’ll be tweeting for #ENG erland, #ENG erland,#ENG erland!
If we want to analysis World Cup-related tweets we will need an archive of the tweets, ideally from a service which provide APIs. I have checked TwapperKeeper and found there are archives for both the #wc2010 and #worldcup hashtags – interestingly the latter is much more popular with over 202,00 tweets compared with the 43,000+ tweets from the shorter variant. I also noticed that @jennifermjones, a researcher at Loughborough University whom I follow on Twitter created these two archives – and herself seems to prefer the #worldcup tag.
What Else Is Happening?
Are there any examples of innovative uses of Linked Data and Social Media in the content of the football that you are aware of? Or, indeed, ideas you would like to suggest which football-supporting geeks might be interested in implementing? But please provide suggestions before the quarter finals – English developers tend to lose interest in the World Cup around that time! And Wimbledon doesn’t have the same appeal.