UK Web Focus (Brian Kelly)

Innovation and best practices for the Web

A Challenge To Linked Data Developers

Posted by Brian Kelly on 12 Feb 2010

Back in November, following the interest in Linked Data which had been discussed at a CETIS 2009 Conference I wondered whether it was Time To Experiment With DBpedia?

The following month I attended the Online Information 2009 conference. As I described in a post on the Highlights of Online Information 2009: Semantic Web and Social Web it was clear to me that “ #semanticweb was the highlight & relevant for early mainstream“.  A blog post which provided the LIS Research Coalition “review” of Online 2009 was in agreement: “sessions on the semantic web gave the impression that those in library and information science related roles are now beginning to consider the exploitation of data to data links“.

However a concern I raised with Ian Davis,  CTO of Talis UK following his keynote talk on “The Reality of Linked Data” was the danger of overhyping expectations; something I feel is very relevant in light of the perceived failure of the Semantic Web to live up to the potential of evangelists in the early years of the last decade.  Has, for example, the “new form of Web content that is meaningful to computers will unleash a revolution of new possibilities” described in the Semantic Web article published in Scientific America (and also available from in May 2001 arrived? I think not.

There is a danger, I fear, that the renewed enthusiasm felt by increasing numbers of developers will not be shared by managers and policy makers – leading to interesting pilots and prototypes which do not necessarily become deployed in a mainstream service environment.

A suggestion I made to a number of Linked Data experts at the Online Information 2009 conference was to demonstrate the value of Linked Data not by providing examples in niche subject areas (e.g. chemistry) but by taking an example which everyone can understand.

In my post Time To Experiment With DBpedia? I used the DBpedia Faceted Browser to search for information about UK Universities – in the example I searched for UK Universities which were founded in 1966. But this wasn’t demonstrating how Linked Data can be used to join information which have different underlying structures.

My challenge to Linked Data developers is to make use of the data stored in DBpedia (which is harvested from Wikipedia) to answer the query “Which town or city in the UK has the highest proportion of students?“.  This would involve processing the set of UK Universities, finding all Universities from the same town or city, recording the total number of students  and then, from the town/city entries in DBpedia, finding the total population in order to identify the town or city with the largest proportion of students.

I’m not too concerned about some of the edge cases (i.e. the differences between the City of London and Greater London or the Universities with campuses in several locations).  Rather I want to know:

  • Can Linked Data solve this problem (from a theoretical perspective)?
  • Is DBpedia able to solve this problem (from a theoretical perspective)?
  • How difficult is it to solve the problem (is it a trivial 1 line SPARQL query or would it require several months of work?)

 Any takers?  And note the answer must be provided using DBpedia – asking your friends on Twitter is cheating!


11 Responses to “A Challenge To Linked Data Developers”

  1. Social comments and analytics for this post…

    This post was mentioned on Twitter by briankelly: A Challenge To #LinkedData Developers: Use DBpedia to find place in UK with largest %age of students:

  2. […] example SPARQL queries too… I also note that no Linked Data folk appear to have picked up on Brian Kelly’s challenge? Brian, do I win if I can wort out a solution using stuff from the Guardian Datastore and Google […]

  3. […] Challenge To Linked Data Developers [web link]UK Web Focus (12/Feb/2010)“…time to experiment with dbpedia the following month […]

  4. Tony Hirst said

    If you want authentic student population stats, HESA is probably the best place to go – eg

    The data is in XLS spreadsheets, which means it’s difficulty to interrogate directly. There’s also the issue of doing a trivial mapping from the institution name to the location, although of course this will be misleading for your percentage stats if the reported student numbers represent total enrollment across multiple campuses in different towns/cities.

  5. […] A Challenge To Linked Data Developers […]

  6. […] A Challenge To Linked Data Developers […]

  7. […] was the background to my recent “Challenge To Linked Data Developers” in which I asked “Which town or city in the UK has the largest proportion of […]

  8. […] Data community baiting them to demonstrate some of the utility of the Linked Data approach (e.g. A Challenge To Linked Data Developers (followed up in Response To My Linked Data Challenge) and Linked Data: my challenge, with some […]

  9. […] You can keep track of his experiments with data at and he’ll be talking about some of these on the evening (including his work on the infamous Brian Kelly Linked Data challenge!) […]

  10. […] striking example for the difficulty of semantic interoperability is a Linked Data challenge which sought to answer the question: “Which town or city in the UK has the highest proportion of […]

  11. […] on Getting information about UK HE from Wikipedia which explores some of the ideas I discussed on A Challenge To Linked Data Developers. But rather than discussing how DBpedia might be used to analyse data about Universities in this […]

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: