A Challenge To Linked Data Developers
Posted by Brian Kelly on 12 February 2010
Back in November, following the interest in Linked Data which had been discussed at a CETIS 2009 Conference I wondered whether it was Time To Experiment With DBpedia?
The following month I attended the Online Information 2009 conference. As I described in a post on the Highlights of Online Information 2009: Semantic Web and Social Web it was clear to me that “ #semanticweb was the highlight & relevant for early mainstream“. A blog post which provided the LIS Research Coalition “review” of Online 2009 was in agreement: “sessions on the semantic web gave the impression that those in library and information science related roles are now beginning to consider the exploitation of data to data links“.
However a concern I raised with Ian Davis, CTO of Talis UK following his keynote talk on “The Reality of Linked Data” was the danger of overhyping expectations; something I feel is very relevant in light of the perceived failure of the Semantic Web to live up to the potential of evangelists in the early years of the last decade. Has, for example, the “new form of Web content that is meaningful to computers will unleash a revolution of new possibilities” described in the Semantic Web article published in Scientific America (and also available from Ryerson.ca) in May 2001 arrived? I think not.
There is a danger, I fear, that the renewed enthusiasm felt by increasing numbers of developers will not be shared by managers and policy makers – leading to interesting pilots and prototypes which do not necessarily become deployed in a mainstream service environment.
A suggestion I made to a number of Linked Data experts at the Online Information 2009 conference was to demonstrate the value of Linked Data not by providing examples in niche subject areas (e.g. chemistry) but by taking an example which everyone can understand.
In my post Time To Experiment With DBpedia? I used the DBpedia Faceted Browser to search for information about UK Universities – in the example I searched for UK Universities which were founded in 1966. But this wasn’t demonstrating how Linked Data can be used to join information which have different underlying structures.
My challenge to Linked Data developers is to make use of the data stored in DBpedia (which is harvested from Wikipedia) to answer the query “Which town or city in the UK has the highest proportion of students?“. This would involve processing the set of UK Universities, finding all Universities from the same town or city, recording the total number of students and then, from the town/city entries in DBpedia, finding the total population in order to identify the town or city with the largest proportion of students.
I’m not too concerned about some of the edge cases (i.e. the differences between the City of London and Greater London or the Universities with campuses in several locations). Rather I want to know:
- Can Linked Data solve this problem (from a theoretical perspective)?
- Is DBpedia able to solve this problem (from a theoretical perspective)?
- How difficult is it to solve the problem (is it a trivial 1 line SPARQL query or would it require several months of work?)
Any takers? And note the answer must be provided using DBpedia – asking your friends on Twitter is cheating!