UK Web Focus (Brian Kelly)

Innovation and best practices for the Web

  • Email Subscription (Feedburner)

  • Twitter

    Posts on this blog cover ideas often discussed on Twitter. Feel free to follow @briankelly.

    Brian Kelly on Twitter Counter

  • Syndicate This Page

    RSS Feed for this page

    Licence

    Creative Commons License
    This work is licensed under a Creative Commons Attribution 2.0 UK: England & Wales License. As described in a blog post this licence applies to textual content published by the author and (unless stated otherwise) guest bloggers. Also note that on 24 October 2011 the licence was changed from CC-BY-SA to CC-BY. Comments posted on this blog will also be deemed to have been published with this licence. Please note though, that images and other resources embedded in the blog may not be covered by this licence.

    Contact Details

    Brian's email address is ukwebfocus@gmail.com. You can also follow him on Twitter using the ID briankelly. Also note that the @ukwebfocus Twitter ID provides automated alerts of new blog posts.

  • Contact Details

    My LinkedIn profile provides details of my professional activities.

    View Brian Kelly's profile on LinkedIn

    Also see my about.me profile.

  • Top Posts & Pages

  • Privacy

    Cookies

    This blog is hosted by WordPress.com which uses Google Analytics (which makes use of 'cookie' technologies) to provide the blog owner with information on usage of this blog.

    Other Privacy Issues

    If you wish to make a comment on this blog you must provide an email address. This is required in order to minimise comment spamming. The email address will not be made public.

Archive for the ‘Linked Data’ Category

Call for Use Cases: Social Uses and Other New Uses of Library Linked Data

Posted by Brian Kelly on 21 Jan 2011

The W3C’s Library Linked Data Incubator Group has issued a “Call for Use Cases: Social uses and other new uses of Library Linked Data“. The call begins:

Do you use library-related data — like reading lists, library materials (articles, books, videos, cultural heritage or archival materials, etc), bookmarks, or annotations — on the Web and mobile Web?

Are you currently using social features in library-related information systems or sites, or plan to do so in the near future? We are particularly interested in uses that are related to or could benefit from the use of linked data.

The W3C Library Linked Data Incubator Group is soliciting SOCIAL and EMERGENT use cases for library-related linked data:

  • What new or innovative uses do you see (or envision) integrating library and cultural heritage data into applications on the Web and in social media?
  • How are social features used in library-related information systems?
  • What are the emergent uses of library-related data on the Web and mobile Web?

How could linked data technology [1]:

  • enhance the use of library-related data in a social context?
  • contribute to systems for sharing, filtering, recommending, or machine reading?
  • support new uses we may not have envisioned or achieved yet?

Some examples have been discussed in this thread [4].

Please tell us more by filling in the questionnaire below and sending it back to us or to public-lld@w3.org, preferably before February 15th, 2011 (note the original email incorrectly had 2010).

The information you provide will be influential in guiding the activities the Library Linked Data Incubator Group will undertake to help increase global interoperability of library data on the Web. The information you provide will be curated and published on the group wikispace at [3].

We understand that your time is precious, so please don’t feel you have to answer every question. Some sections of the templates are clearly marked as optional. However, the more information you can provide, the easier it will be for the Incubator Group to understand your case. And, of course, please do not hesitate to contact us if you have any trouble answering our questions.

Editorial guidance on specific points is provided at [2], and examples are available at [3].

The message then goes on to provide the template for the use cases.

I would think that there be a range of relevant examples of such use cases based on institutional developments and  JISC-funded project and service developments. It would be very useful, I feel, if the UK higher education sector were to contribute to this call as this can help to ensure that W3C’s Linked Data work will be informed by the experiences and requires of our sector.  I should add that I have come across examples of standardisation activities in the past which have reflected US approaches and which do are not easily implemented for UK working practices.

If you are involved in such Library-related Linked Data activities I would encourage you to read the original requests and respond accordingly.  Feel free to leave a comment here if you have contributed a use case.

Posted in Linked Data, W3C | Leave a Comment »

RDFa API Draft Published

Posted by Brian Kelly on 28 Sep 2010

The W3C have recently announced that the RDFa API draft has been published. As described in the announcement “RDFa enables authors to publish structured information that is both human- and machine-readable. Concepts that have traditionally been difficult for machines to detect, like people, places, events, music, movies, and recipes, are now easily marked up in Web documents“.

The RDFa API draft document itself helpfully provides several examples which illustrate the potential benefits of use of RDFa:

Enhanced Browser Interfaces: Dave is writing a browser plugin that filters product offers in a web page and displays an icon to buy the product or save it to a public wishlist. The plugin searches for any mention of product names, thumbnails, and offered prices. The information is listed in the URL bar as an icon, and upon clicking the icon, displayed in a sidebar in the browser. He can then add each item to a list that is managed by the browser plugin and published on a wishlist website.

Data-based Web Page Modification: Dale has a site that contains a number of images, showcasing his photography. He has already used RDFa to add licensing information about the images to his pages, following the instructions provided by Creative Commons. Dale would like to display the correct Creative Commons icons for each image so that people will be able to quickly determine which licenses apply to each image.

Automatic Summaries: Mary is responsible for keeping the projects section of her company’s home page up-to-date. She wants to display info-boxes that summarize details about the members associated with each project. The information should appear when hovering the mouse over the link to each member’s homepage. Since each member’s homepage is annotated with RDFa, Mary writes a script that requests the page’s content and extracts necessary information via the RDFa API.

Data Visualisation: Richard has created a site that lists his favourite restaurants and their locations. He doesn’t want to generate code specific to the various mapping services on the Web. Instead of creating specific markup for Yahoo Maps, Google Maps, MapQuest, and Google Earth, he instead adds address information via RDFa to each restaurant entry. This enables him to build on top of the structured data in the page as well as letting visitors to the site use the same data to create innovative new applications based on the address information in the page.

Linked Data Mashups: Marie is a chemist, researching the effects of ethanol on the spatial orientation of animals. She writes about her research on her blog and often makes references to chemical compounds. She would like any reference to these compounds to automatically have a picture of the compound’s structure shown as a tooltip, and a link to the compound’s entry on the National Center for Biotechnology Information [NCBI] Web site. Similarly, she would like visitors to be able to visualize the chemical compound in the page using a new HTML5 canvas widget she has found on the web that combines data from different chemistry websites.

However the example I find most interesting is the following:

Importing Data: Amy has enriched her band’s web-site to include Google Rich Snippets event information. Google Rich Snippets are used to mark up information for the search engine to use when displaying enhanced search results. Amy also uses some ECMAScript code that she found on the web that automatically extracts the event information from a page and adds an entry into a personal calendar.

Brian finds Amy’s web-site through Google and opens the band’s page. He decides that he wants to go to the next concert. Brian is able to add the details to his calendar by clicking on the link that is automatically generated by the ECMAScript tool. The ECMAScript extracts the RDFa from the web page and places the event into Brian’s personal calendaring software – Google Calendar.

Although all of the use cases listed above provide sample RDFa markup the final example makes use of Google Rich Snippets for which there is a testing tool which illustrates the structure which is visible to Google. I have been using RDFa on my forthcoming events page for a while so using the Rich Snippets testing tool it is useful to see how the structure provided on that page is processed by Google.

The testing tool does point out that “that there is no guarantee that a Rich Snippet will be shown for this page on actual search results“. As described in the Rich Snippets FAQCurrently, review sites and social networking/people profile sites are eligible. We plan to expand Rich Snippets to other types of content in the future“.

So although there is no guarantee that use of RDFa embedded in HTML pages using Google Rich Snippets for, say, events will ensure that search results for an events hosted on your Web site will provide a structured display of the information like this:

the fact that Google’s Rich Snippets are explicitly mentioned in the RDFa API draft document does seem to suggest commitment from a leading player which has a vested interest in processing structured information in order to improve the searching process.

And of course the “ECMAScript code that [Amy] found on the web that automatically extracts the event information from a page and adds an entry into a personal calendar” suggests that such RDFa information can be processed today without the need for support from Google.  Now does anyone know where Amy found this ECMAScript code?

Posted in Linked Data, W3C | 3 Comments »

Linked Data for Events: the IWMW Case Study

Posted by Brian Kelly on 21 Sep 2010

Linked Data and Events

In a post entitled “Getting information about UK HE from Wikipedia” published in July on the Ancient Geek’s blog Martin Poulter commented that “At IWMW 2010, last week, a lot of discussion centred around how, in an increasingly austere atmosphere, we can make more use of free stuff. One category of free stuff is linked data. In particular, I was intrigued by Thom Bunting (UKOLN)‘s presentation about extracting information from Wikipedia.

Martin’s comment related to a Linked Data application developed by my colleague Thom Bunting which he demonstrated in the final session at the IWMW 2010 event.  In this post I would like to summarise this work.

Thom, along with UKOLN colleagues Adrian Stevenson and Mark Dewey, were facilitators for a workshop session at the IWMW 2010 event. In the run-up to the event  I suggested to Thom that it would be useful to exploit the historical  data for UKOLN’s annual Institutional Web Management Workshop (IWMW) series of events.  This event was launched in 1997 and has been held annually ever since, with this year’s event, IWMW 2010, being the 14th in the series.

The Web sites for all 14 events continue to be hosted on the UKOLN Web site and care has been taken to ensure that the URLs for the Web sites have remained persistent.   In the past five years or so as well as providing a series of HTML pages we have also provided a series of RSS files for each event which can be used not only to provide news for the events but also to enable key information sources, including speaker biographies and abstracts of the talks and workshop sessions to be syndicated and reused by other applications. We have recently ensured that such RSS files are available for all of the workshops.

Our intention was to make use of this information in order to develop a Linked Data application which would demonstrate the potential of Linked Data in an application area (organising events) which is of relevance to all higher educational institutions.

Thom has written a post on Consuming and producing linked data in a content management system in which he describes the technical details of how he used Drupal to produce his Linked data application, having first processed the various RSS feeds (which were available in the RSS 2.0 format which is not suitable for Linked Data applications without further processing). In this post I want to highlight some of the benefits to end users which his application provides:

Information on the host institution of participants, speakers and facilitators at events: The RSS 2.0 files had previously been used to provide a Google Map showing the location of speakers and facilitators.  This information had been extended to include the host institution of participants at recent events.  But in addition to the map, clicking on an icon will display information about the numbers of participants from the institution together with information about the institution. The important thing to note is that the institutional information is not held locally; rather this information is gathered from the institution’s entry in DBpedia.

The screen image below (taken from the Locations map area of the IWMW Linked Data application) shows this information, providing details of the speakers and facilitators from the University of Southampton at recent events (this being public information) and a summary  of the total numbers of participants from the institution.   The image also shows the information taken from DBpedia, which includes information that the institution is part of the Russell group, together with details of, for example, the student numbers.

This illustrates access via a specific institution. It is also possible to view such information for all institutions which have participated at recent event events. Illustrated below are the results of such a query sorted by the total number of registrations.

It should be noted that what is referred to as the ‘Loyalty rate’ is calculated as the total registrations / total persons registering. This gives a general indication of how many annual IWMW events, on average, each person registering from a specified institution (or company) has attended.

What Next?

This work has provided UKOLN with a better understanding of the impact the IWMW series of events has had across the HE sector.  We can now see the institutions which have contributed to or attended significant numbers of events and, conversely, those which haven’t. And although such information could be obtained through use of a internal database the integration with the institutional data could not realistically have been achieved without use of a Linked Data approach.

Our next step will be to include information about the speakers and participants at the IWMW events and the topics of their sessions.  As described previously such information is available at stable URIs. However we are waiting for an upgrade to the Drupal software before we begin on this next step.

We hope this summary illustrates some of the benefits which use of Linked Data can provide.

Posted in Linked Data | 4 Comments »

DBPedia and the Relationships Between Technical Articles

Posted by Brian Kelly on 14 Sep 2010

Wikipedia is Popular!

I recently wrote a blog post in which I asked How Well-Read Are Technical Wikipedia Articles? The statistics I quoted seemed to suggest that content provided in Wikipedia is well-read. This, for me, suggests that we should be making greater efforts to enhance the content provided in Wikipedia – and avoid having valuable content being hidden away in large PDF reports which few people are likely to read.

Wikipedia Infobox for HTML5 entry

Wikipedia Infoboxes

But in addition to the content providing in Wikipedia it also seems to me that we should be making more of an effort in exploiting the potential of the Wikipedia Infoboxes.

An infobox is described as “a fixed-format table designed to be added to the top right-hand corner of articles to consistently present a summary of some unifying aspect that the articles share and to improve navigation to other interrelated articles“.

An example of an infobox for the HTML5 Wikipedia entry is shown. As suggested by the definition it provides a summary of the key aspects of the HTML5 markup language. If you view the entry for HTML you will similar information which is presented in a similar fashion.

The infoboxes provide consistency in the user interface for groups of related Wikipedia pages. A better example can be gained if you look at entries for countries or cities. For example view the entries for the UK and USA or Bath, Bristol and London to see how the infoboxes are being used in these contexts.

If the Infoxes were solely concerned with the user display I wouldn’t be too interested. However these sets of structured information form the basis of the content which is used in DBpedia. And the ability to process such information when it is provided in Linked Data is really interesting.

An example of the potential for DBpedia has been described by Martin Poulter in a post on Getting information about UK HE from Wikipedia which explores some of the ideas I discussed on A Challenge To Linked Data Developers. But rather than discussing how DBpedia might be used to analyse data about Universities in this post I want to explore its potential for exploring information about technical standards.

DBpedia and Relationships Between Technical Standards

The DBpedia RelFinder illustrates how such structured and linked information can be processed. I used this service to explore the relationships between the Wikipedia infobox entries for XML, XSLT and the World Wide Web Consortium. The output is illustrated below.

Relationships between W3C, XML and XSLT

If we are looking to provide developers with a better understanding of important technical standards and their relationships, rather than writing reports which provide such information wouldn’t it be more effective if we ensured that we engaged in the creation and maintenance of such information provided in infoboxes in Wikipedia entries as well as contributing to the content of such pages?

If you look at the entry for the metadata standards for MODS (Metadata Object Description Schema) or METS (Metadata Encoding and Transmission Standard) you’ll find that these entries do not (currently) have an infobox. Similarly the entry for DCMI is also lacking such structured reusable metatadata – which is perhaps disappointing for a organisation which is about metadata.

Isn’t it time to engage more with Wikipedia? And if the development community across the UK HE sector were to do this in a pro-active fashion wouldn’t this be a good example of how the benefits can be shared more widely? The Big Society, perhaps?

Links between JISC projectsI’ll conclude by saying that if you are still unclear as to what a visualisation of the relations between such resources might look like you can view a video in which Balviar Notay illustrates how such an application might be used for “a search tool that visualises the links between JISC projects to help explore the knowledge that the projects have produced“.

Posted in Linked Data, standards | 1 Comment »

Geeks, Linked Data and the World Cup

Posted by Brian Kelly on 12 Jun 2010

Linked Data and the World Cup

A couple of months ago Kingsley Idehen (Founder & CEO of OpenLink Software and an Open Linked Data enthusiast according to his Twitter profile) mentioned on Twitter that he expected to see lots of interesting Linked Data developments taking place around the World Cup. This prediction seems to be coming true judging by the tweet I received from @AdamLeadbetter last night:

@briankelly RT @rlpow: Our #semweb #worldcup crazies are on a roll! @neumarcx @hekanibru @uogbuji

Looking at @neumarcx’s tweets I find a link to a DBpedia entry related to the World Cup:

well so far http://dbpedia.org/resource/France_national_football_team is doing a few things better I’d say #WC2010

and:

Je suis désolé, mais je n’ai pas le choix. But my money is on http://dbpedia.org/resource/Uruguay_national_football_team

I think Kingsley is right – there’s now a great opportunity to see some Linked Data developments in an area which will be of interest to many people around the world. And let’s be honest, the bioinformatics Linked Data examples haven’t really had much public appeal! In addition the public awareness of football and the World Cup also provides an opportunity to raise awareness of some of the complexities in machine-understandable – we know, for example, what Americans mean when they talk about ‘soccer’ but software wouldn’t unless mappings between ‘football’ and ‘soccer’ are provided.

Ownership of the Data

There will also be an opportunity to raise awareness of the issues associated with ownership of data.  As I described in a post entitled What’s The Score? And Whose Score Is It, Anyway? according to an entry in Wikipedia the fixture list for UK’s four professional football leagues: the Premier League, The Football League, the Scottish Premier League and the Scottish Football League is owned by Football DataCo. And a year ago @ollieparsley in his The FootyTweets “Cease and Desist” Story described how he “received a Cease and Desist notice from a company that looks after the Premier League and Football Leagues copyright online“. He went on to add that he “checked that the company was legitimate and I am unhappy to say that they are legitimate“. Ollie subsequently also wrote about a MotorTweets Formula 1 Cease and Desist letter. This described how the “Formula One Administration Limited (”FOA”) has the exclusive right to commercially exploit the FIA Formula One World Championship (”the Championship”) including, but not limited to, all moving images, other audio/video content, timing data and results“. So I hope that the Linked Data football-supporting geeks have good lawyers! Or perhaps we should regard this as an opportunity for civil disobedience, claiming that we, the public, have the rights to do what we want with our sporting data – it’s part of our culture and shouldn’t be privatised. Peter Murray-Rust has argued a similar argument related to scientific outputs in a post where he argued that scientists (and librarians) should “Post ALL ACADEMIC OUTPUT publicly – IGNORE COPYRIGHT“.

What Can Linked Data Offer?

What Linked Data applications might appeal to the general public? I have tried DBpedia’s Relationship Finder which depicts relationships between data provided in Wikipedia information boxes. The image shown below shows the relationships between the entries for the England national football team and the German national football team. As can be seen the 1966 World Cup Final is shown as a significant relationship between these two countries :-)

England and Germany football relationships

As depicted in the graph, the relationship is actually between the England and West Germany national football teams, although there is a direct relationship between the West Germany and Germany teams. How, I wonder, would this relationship have been depicted is we had beaten USSR – a county which, like West Germany, no longer exists. Seeing such relationships makes one aware of the complexities in interpreting data.

What About Twitter?

We saw with the #uksnow example how Twitter can be used to aggregate lightly-structured data.  Might Twitter have a similar role to play during the World Cup.  If World Cup tweets are to be analysed there will be a need to identify the relevant hashtags – and I have seen, from my Twitter followers, the #wc2010 and the #worldcup tags both being used.  But will there be agreement on a hashtag for the countries competing in #wc2010 (to use by preferred hashtag)? Last night I observed three character country codes being used (#FRA and #URU).  Assuming there is an agreed international standard for such country code for national football teams it might be possible to carry out some interesting sentiment analysis  – although as we learnt from the #nickcleggsfault st0ry automated analyses can misinterpret irony. We might also need to be aware that disgruntled Scottish fans may be included to tweet for #AnyTeamBarENG! As for me, I’ll be tweeting for #ENG erland, #ENG erland,#ENG erland!

If we want to analysis World Cup-related tweets we will need an archive of the tweets, ideally from a service which provide APIs. I have checked TwapperKeeper and found there are archives for both the #wc2010 and #worldcup hashtags – interestingly the latter is much more popular with over 202,00 tweets compared with the 43,000+ tweets from the shorter variant. I also noticed that @jennifermjones, a researcher at Loughborough University whom I follow on Twitter created these two archives – and herself seems to prefer the #worldcup tag.

What Else Is Happening?

Are there any examples of innovative uses of Linked Data and Social Media in the content of the football that you are aware of? Or, indeed, ideas you would like to suggest which football-supporting geeks might be interested in implementing? But please provide suggestions before the quarter finals – English developers tend to lose interest in the World Cup around that time! And Wimbledon doesn’t have the same appeal.

Posted in Linked Data | Leave a Comment »

Sig.ma, Linked Data and New Media Literacy

Posted by Brian Kelly on 21 May 2010

Consuming Linked Data Tutorial

At the Consuming Linked Data tutorial I attended on Monday 25 April 2010 I heard details of a number of Linked Data applications which could be used to process the Linked Web of Data. Of particular interest to me was the sig.ma search engine. Below I discuss the implications of  discovering that personal content (perhaps provided using Social Web tools) becoming surfaced in a Linked data environment through semantic search tools such as sig.ma.

sig.ma

sig.ma is described as a “semantic information mashup” service. I used this Web-based service for vanity searching: a  search for “Brian Kelly UKOLN” provides me with an interesting view of how data about me is freely available in the Web of Linked Data. A screen shot is shown below.

Use of sig.ma service to view resources for "Brian Kelly UKON"

The service demonstrates how information from disparate sources can be brought together in a mashup and, as such, is worth trying out to see what information the Web of Linked Data has about you.I found, for example, many blog posts which I was unaware of which referenced my work in some way such as a summary of an event in Southampton I spoke at last year;a reference to a post of mine in a post on FriendFeed: where the conversation happens and a link to one of my briefing documents in a list (in the Czech language, I think) of Library 2.0 resources. In total it seems there were 152 sources of Linked Data information about me.

This service is of interest to me not only for the information it contains but also to understand incorrect information which may be held, the reasons for such information and the risks that personal information you may not wish to be shared has already been gathered and is available in Linked Data space.

Linked Data and New Media Literacy

As can be seen in the above image two data source think that my surname is ‘UKOLN’. Further investigation reveals that this is due to the slides from a talk I gave at the Museums and the Web 2009 conference, which were uploaded by the conference organisers, having incorrect metadata.

As well as information gathered from Slideshare, sig.ma has also gathered information from Twitter, the Archimuse Web site (which organised the Museums and the Web conference), this blog and various resources I maintain on the UKOLN Web site. And on asking the service to retrieve data from additional services I discover that Linked Data about me is also available from data held on Wikipedia, YouTube, Ning, Scribd, Blip.tv, VOX, Blogspot and Tumblr as well as a number of individual’s blogs e.g. posts on Stephen Downes, Dulwichonview, Daveyp, the Openwetware and no doubt (many?) other blogs. It would appear that if you are a user of these popular Social Web services your information may be available as Linked Data.

I also noticed that sig.ma knew my date of birth.  I have also tried to conceal this information from public display and was puzzled as to how it came to be known.  I discovered that I had included my data of birth in a FOAF file which I created in about 2003 – before I decided to conceal this information. I have removed the data of birth from my FOAF file – but how long will it take for sig.ma to refresh this data source, I wonder?

The large amount of information about my professional activities which can be found using sigm.ma is pleasing – and it is good to see how RSS feeds, RDFa and other structured data sources which are accessible from various Social Web services is being used.  But what if the information is wrong, misleading, embarrassing or is confidential? I have recently read that Syracuse University [is] to Provide Online Reputation Management to Graduates. We all need to have the skills to carry out such reputation management activities, I feel. And search engine which explore Linked Data sources should now be part of the portfolio of tools we need to understand. Have those involved in New Media Literacy appreciated this, I wonder?

Posted in Linked Data | 3 Comments »

Linked Data and Lessons From the Past

Posted by Brian Kelly on 28 Apr 2010

The Buzz at WWW 2010

I’m current in Raleigh, North Carolina attending the WWW 2010 conference. And the buzz at the conference so far seems to focus on Linked Data (although I should add that I am writing this during the two days of pre-conference events and I have been out socialising with Linked Web developers, so perhaps these conclusions are quite subjective and premature!).

This excitement reminds me of previous WWW conferences – and makes me reflect on the extent to which the passion felt be many developers and researchers at these conferences actually results in significant changes in the Web landscape in the short term or whether the enthusiasms simply result in a failure to engage the mainstream community and a failure to address a bigger picture. So here’s my reflections of the excitement I felt after the WWW 2003 conference.

Reflections on WWW 2003

I remember returning from the WWW 2003 feeling inspired. In part this was after seeing how a communications infrastructure (WiFi and IRC) could be used to support a conference – this was my very first ‘amplified conference’ (although Lorcan Dempsey hadn’t coined that term at the time). The interest in this approach was described in an article by Paul Shabajee entitled “‘Hot’ or Not? Welcome to real-time peer review” published in The Time Higher educational Supplement. So inspired was I by the potential I felt that use of WiFi technologies to allow event participants to engage more actively in discussions that I wrote a paper entitled “Using Networked Technologies To Support Conferences” together with Paul and my colleague Emma Tonkin.

But despite my interest in this area, the topic that really inspired me at the conference in Budapest was the Semantic Web. The Semantic Web was not new to me – but instead having to listen to talks about low level protocol issues for the first time people were talking about – and, more importantly, demonstrating Semantic Web applications. The buzz at the conference, especially amongst a group of people I knew from about Bath and Bristol, focussed on FOAF – the Friend of a Friend vocabulary and associated applications developed initially by Dan Brickley and Libby Miller who then worked at ILRT at the University of Bristol.

So inspired was I by this lightweight approach to what subsequently became commonly referred to as social networking software that shortly afterwards I created my own FOAF file. And a few months after the WWW 2003 conference Dave Beckett (a Semantic Web researcher based, at the time, at the University of Kent) and myself gave a plenary talk on “Semantic Web Technologies for UK HE and FE Institutions” at UKOLN’s IWMW 2003 event – raising awareness of the potential of the Semantic Web to members of institutional Web teams in UK universities.

The following year myself and Leigh Dodds (a Semantic Web developer who then worked at Ingenta in Bath) explored ways in which we could seek to engage a wider community in this early example of a Semantic Web application. In a paper on “Using FOAF To Support Community Building” we described lightweight FOAF authoring tools which could be used to create FOAF files and viewers which could provide tangible evidence of the benefits. These tools were promoted to participants via a resource on Use of FOAF which was promoted at the IWMW 2004 event which summarised the potential of FOAF and described tools for creating and viewing FOAF (e.g. FOAFnaut, FOAF Explorer and Plink) and possible concerns people may have with this technology.

What Happened?

What happened after the identification of the new big idea at WWW 2003 was followed up by talks to the Web management community by a respected Semantic Web developer and the provision of simple authoring and viewing tools and a context for use provided? The answer was ‘not much’. A few people created their own FOAF files, but most seemed to have no interest, and this lack of interest continued despite the Use of FOAF being encouraged the following year at IWMW 2005, But it was quite clear that FOAF had failed to take off within this community. And in November 2005 I gave a talk on “Lessons Learnt From FOAF: A Bottom-Up Approach To Social Networks” which “describe[d] some of FOAF’s apparent failings to live up to its initial potential and discuss possible reasons for this“.

Lessons

In 2005 I was speculating on FOAF’s ‘apparent’ failure to fulfil the excitement I felt in 2003. The reasons I gave included people’s concerns regarding privacy, concerns regarding the term ‘friend’ and the perception that maybe the marketplace to legitimise the area.

But if we move on a few years we find that many people are now ready to share information on Facebook and ‘befriend’ people, even those they may have not met.

For me this example illustrates that back in the early to mid ‘noughties’ there was too much of a focus on the underlying technologies (how the Semantic Web would be build) and a failure to understand whether user’s real needs and requirements were being addressed.

What Next?

Despite the efforts of some researchers who are currently attempting to put a damper on my enthusiasms for Linked Data (:-) the failures of the Semantic Web to deliver on its initial promises shouldn’t be regarded as a reason to be sceptical regarding the promise of Linked Data today.

Rather we need to welcome Critical Friends who are willing to provide constructive criticisms on questionable claims of Linked Data and to help identify appropriate areas for deploying Linked Data approaches; the deployment models; the skills and other resources which are needed and the associated risks.

As well as the critical appraisal, which is particularly important at a time in which investment in the public sector is decreasing, we will also, however, need to continue the advocacy in order to ensure that the benefits of Linked Data are not being ignored. I will be publishing posts on the relevance of Linked Data for individuals and institutions shortly.

Posted in Linked Data | 2 Comments »

“We Have the Highest Proportion of Students!”

Posted by Brian Kelly on 7 Apr 2010

Back in September 2001 I gave a talk at the JANET User Support Workshop, which was held at Loughborough University. I remember a Pro Vice Chancellor giving the welcome talk during which he mentioned that “Loughborough has the highest proportion of students of any place in the UK” (or words to that effect). I remember him saying that as I worked at Loughborough University from 1984-90 and I was interested in seeing how the increases in the numbers of students was changing the town centre – there were a number of superpubs which weren’t there when I lived in the town.

Last November I spent a few days at Aberystwyth University. While I was there, on my way to a CAMRA pub, I noticed large numbers of students (dressed as doctors and nurses) on a pub crawl around the town. This made me wonder if a small place like Aberystwyth might have overtaken Loughborough as the town or city in the UK with the largest proportion of students.

That was the background to my recent “Challenge To Linked Data Developers” in which I asked “Which town or city in the UK has the largest proportion of students?“. In order to simplify the challenge and avoid the need for SPARQL developers to have to track down official relevant data sources I asked that the challenge be addressed using data held in DBpedia, the RDF datastore of structured information provided in Wikipedia. An additional aim was to gain an understanding of the quality of the data (and the data structures) held in DBpedia, which is frequently mentioned as having a central role to play in the Linked Data world.

A week after issuing my challenge I published the “Response To My Linked Data Challenge“. However the answers obtained from querying DBpedia were clearly incorrect – Cambridge, for example, doesn’t have a population of 12!

On the DCC blog Chris Rusbridge has revisited my challenge in a post entitled “Linked Data and Reality“. Chris suggested that “If we care about our queries, we should care about our sources; we should use curated resources that we can trust. Resources from, say… the UK government?“. That may be true, but I wasn’t primarily after the correct answer when I formulated my challenge – I was more interested in whether DBpedia could provide a reasonable answer, how long it might take to write a SPARQL query and how complex such a query might be. This motivation was acknowledged by bitwacker in his comment that “I think Brian’s challenge should be seen as only a benchmark, a sampling of the effectiveness of linked data practices today.” That’s right – and I’m pleased to have noticed recently that the DBpedia community have recently issued an “Invitation to contribute to DBpedia by improving the infobox mappings“. In addition Kingsley Idehen alerted me to Yago, Opencyc, Umbel, and Sumo ontologies, all of which have binding to DBpedia. (I should also add that Kingsley has written a blog post on “DBpedia receives shot #1 of CLASSiness vaccine” which illustrates how new ontologies can be integrated with DBpedia).

Perhaps DBpedia could have a role to play in answering the type of query I posed – after all, if you want to compare the proportions of students in towns and cities across several countries, mightn’t DBpedia be an easier place to seeks an initial answer, rather than having to find and query statistics from each of the individual countries (especially as the UK Government seems to be taking a leading role in expressing a commitment to Linked Data).

In addition to suggesting that the query should use official Government sources of data (which Chris Wallace has used to provide an answer to my query) Chris also raised the issue about the need to seek clarity in the queries we pose. Using the Guardian Platform Chris Wallace found that the place with the highest proportion of students is Milton Keynes. Chris Rusbridge suggested this in an initial discussion on a LinkedIn Linked Data discussion. And yes, the home of the Open University, is likely to have a large number of registered students. But I don’t think the place will be full of students at the start of the academic year since the Open University is a distance learning institution. The (implied) context of my query was the place for which a significant proportion of students would be likely to affect the local environment, with large numbers of students in town during freshers pub crawls and, perhaps, little happening during vacations. So we should rule out the Open University. But what about other universities with a large number of students on distance learning courses? According to a tweet from lordllamaAbout 41% of 23,000 students at Leicester University are on distance learning courses“.

There is also the question of how we should treat institutions such as the University of Brighton in Hastings which “offers University of Brighton degrees“.  As Margaret Wallis pointed out in response to my initial blog post this institution has  “grown in six years from 40 students to 600+“. But should those students be included in the totals for the Univeristy of Brighton or for Hastings? The general question is how we should treat institutions which have multiple campuses, split across different towns or, as may be case in this example, institutions which award degrees on befalf of other institutions.

You may also notice that my question about places with a large proportion of students is now talking about universities and university students. But what about students at FE colleges? And school children?

Chris Rusbridge highlighted such complexities: “The point is, these things are hard. Understanding your data structures and their semantics, understanding the actual data and their provenance, understanding your questions, expressing them really clearly: these are hard things.” Chris concluded “I’m beginning to worry that Linked Data may be slightly dangerous except for very well-designed systems and very smart people…” Chris probably had his tongue in his cheek with his ‘smart people‘ remark but he may be right with his warning that Linked Data might be dangerous. If a simply query such as “Which town or city in the UK has the largest proportion of students?” is open to a number of different interpretations, what are the implications for more complex queries.

In my “Response To My Linked Data Challenge” I described how Tim Berners-Lee introduced the Semantic Web by described how it aimed to provide an answer to a query such as “Is there a green car for sale for around $15000 in Queensland?“. Tim described how, unlike the search engines of the day, a Semantic Web query would be able to find a result which was described as “Affordable maroon saloon for sale in Brisbane”. But this query is seeking to find additional results which would not be found by a traditional keyword search. The “Which town or city in the UK has the largest proportion of students?“, however, is seeking to find a single answer. Might there be types of queries for which Linked Data might work and others for which if may be difficult or expensive to model the data? Or to rephrase the question what, specifically, is Linked Data for?

Posted in Linked Data | 3 Comments »

ASBOs, Linked Data and Open Data

Posted by Brian Kelly on 31 Mar 2010

The ASBOrometer Mobile App

My colleague Adrian Stevenson commented on his eFragments blog recently that “The Linked Data movement was once again in rude health at last week’s ‘Terra Future’ seminar“. Adrian’s report on the conference highlighted the potential of Linked Data in geo-location applications – and the importance of this area can be gauged by the presence at the seminar of two very high profile surprise guests: Sir Tim Berners-Lee and Professor Nigel Shadbolt.

Adrian’s report mentioned the ASBOrometer application which is “a mobile application that measures levels of anti-social behaviour at your current location (within England and Wales) and gives you access to key local ASB statistics“. As this is freely available for iPhone and Android mobile phones I installed it on my iPod Touch (yes, it works on that device too).  I was interested in seeing a Linked Data application which may be of interest to an end user, as opposed to the various Linked Data application I’ve looked at recently which seems to display RDF triples in various ways.

ASBOrometer location displayThe ASBOrometer applicationWithin a minute or two I had installed the application and discovered a 14.4% “PBS ASB perception rating’ from my home in Bath (which, it seems, indicates a low level of anti-social behaviour).

I was also able to view a map showing the ASBO ratings across England and Wales. I used this to view the ratings for my home town of Liverpool – the red icon shown in the accompanying image indicates that, you will probably not be surprised to learn, there is a high level of anti-social behaviour – 31.4%.

Incidentally the somewhat inappropriately-named Leaderboard button informs me that Liverpool is lagging behind Newham (47.9%), Tower Hamlets (45.9%) and Barking and Dagenham (39.1%).

This application processed data that had been provided by the data.gov.uk initiative.  We can start to gain an appreciation of the momentum behind this initiative from Gordon Brown’s recent speech on “Building Britain’s Digital Future” in which he spoke about “building on this next generation web and the radical opening up of information and data” and also explicitly mentioned Linked Data: “Underpinning the digital transformation that we are likely to see over the coming decade is the creation of the next generation of the web – what is called the semantic web, or the web of linked data“.

In addition to Gordon Brown’s announcement there is also an article in the Daily Mail on “Asbo App for iPhone tells you how anti-social your area is” which tells us that “Housebuyers looking for a nice area to settle down in can check how many of their potential neighbours have Asbos, thanks to a new smartphone application“. If the Prime Minister and the Daily Mail are both talking about Linked Data applications it is clear something important is happening!

Where’s The Linked Data?

The article in the Daily Mail (correctly, I feel) focussed on the uses to which the application could be used and didn’t address how the application processed the data. My interest, however, is more concerned the role of Linked Data in supporting such applications – although I have an interest in the use cases too.

On using the ASBOrometer initially I did wonder where the Linked Data came in.  Wasn’t the application simply retrieving data provided by Government departments and visualising the data?  What’s new? And reading the FAQ I find that the application processes the ASB CDRP Survey Full Dataset and the Number of Anti-Social Behaviour Orders (ASBOs), both provided by the Home Office. The former data is available as a Microsoft Excel file and the latter as a CSV file (provided in, it seems Microsoft Excel format).

So the Home Office seems to be providing open data (available under a “UK Crown Copyright with data.gov.uk rights” licence) but not Linked Data. I’m pleased that the Government is providing open data – and as we have seen, this allows developers to produce valuable application (on 20 February 2010 it was announced that after achieving over 80,000 downloads in two days the ASBOrometer became the number 1 free app in the UK iTunes App Store). But where’s the Linked Data?

I’m not the first person to notice that the Government seems to be conflating Linked Data with open data. An article on “Watching the geeks: do Gordon Brown’s promises on government add up?” published in the Guardian’s Technology blog cites to this analysis by Tom Morris of data published on data.gov.uk:

here are the aggregate results of the data.gov.uk format verification exercise: HTML – 252; XML – 5; Word – 4; RTF – 1; OpenOffice – 1; Something odd – 85; JSON – 9; Nothing there! – 190; CSV – 12; Multiple formats – 1211; PDF – 468; RDF – 10; Excel – 408. TOTAL: 2656

Sadly, this is over-optimistic. I’ve manually checked some of the data that has been categorised as JSON and RDF. Most of it is not actually correctly categorised – either people clicked, say, ‘RDF’ when they meant to click ‘PDF’, or they have seen an RSS or Atom feed and categorised it as RDF.

What this admittedly imperfect dataset is basically saying is that the vast majority of the ‘data’ on data.gov.uk is not actually machine-readable data but human-readable documents.

Discussion

There’s a danger, I feel, that of Linked Data being conflated with Open Data.  If, for example, a policy maker makes the decisions to provide Linked Data along the lines of data.gov.uk what does this mean?  Does this mean providing a CSV file on a public Web site or does it involve choosing appropriate ontologies, ensuring that persistent HTTP URIs are assigned and providing access to an RDF triple store?

There’s also a danger that Linked Data is being treated as a requirement to develop applications such as the ASBOrometer. Such applications can be developed without requiring Linked Data.

Such issues have been raised by Mike Ellis recently in a post on entitled “Linked Data: my challenge“. The post was aimed primarily at the development community and there have been a number of responses from software developers. The comment I found most interesting, however, was made by Kingsley Idehen what sought to reassure Mike: “don’t worry” and went on make what appears to be a significant announcement “Making Linked Data from CSV’s is going to be a click button oriented Wizard affair (any second now, I will be unveiling this amongst other things)“.

So maybe the providers of data sources shouldn’t be concerned about the need to provide RDF (with all the associated complexities) – perhaps the next stage will be tools which will make structure data (perhaps as basic as CSV files) available as Linked Data – and if this demonstrates the benefits of Linked Data a subsequent stage may be to provide the data as native RDF.  On reflection this has parallels with the Web in the early days of its use in CERN. One of the early data sources was the CERN telephone directory – and this was marked up on-the-fly by a script which avoided the need to commit resources to marking up data for what was then a very speculative service – the Web.

So should the push be for open data, I wonder?  Might it be beneficial to defer the debates related to the complexities of Linked Data and RDF to a later data?

Posted in Linked Data | Tagged: | 6 Comments »

Approaches To Debugging The DBpedia Query

Posted by Brian Kelly on 24 Feb 2010

My recent invitation to Linked Data developers to illustrate the potential benefits of Linked Data by providing an answer to a simple query using DBpedia as a data source generated a lot of subsequent discussion. A tweet by Frank van Harmelen (the Dutch Computer Scientist and Professor in Knowledge Representation & Reasoning, I assume) summarised his thoughts of the two posts and related behind-the scenes-activities: “insightful discussion on #linkeddate strengths, weaknesses, scope and limitations“.

But as described in the post, the answer to the question “Which town or city in the UK has the largest proportion of students?” was clearly wrong.  And if you view the output from the most recent version of the query, you’ll see that the answers are still clearly incorrect.

We might regard this ‘quick fail’ as being of more value that the ‘quick win’ which I had expected initially, as this provides an opportunity to reflect onthe processes needed to debug a Linked Data query.

As a reminder here is the query:

#quick attempt at analyzing students as % of population in the United Kingdom by Town
#this query shows DBpedia extraction related quality issues which ultimately are a function of the
#wikipedia infoboxes.

prefix dbpedia-owl: 
prefix dbpedia-owl-uni: 
prefix dbpedia-owl-inst: 

select distinct  ?town ?pgrad ?ugrad  ?population (((?pgrad + ?ugrad) / 1000.0 / ?population ) ) as ?per where {
?s dbpedia-owl-inst:country dbpedia:United_Kingdom;
   dbpedia-owl-uni:postgrad ?pgrad;
   dbpedia-owl-uni:undergrad ?ugrad;
   dbpedia-owl-inst:city ?town.
optional {?town dbpedia-owl:populationTotal ?population. filter (?population >0) }
 }
group by ?town having (((?pgrad + ?ugrad) / 1000.0 / ?population ) ) > 0
order by desc 5

As can be seen, the query is short and, for a database developer with SQL expertise, the program logic should be apparent. But the point about Linked Data is the emphasis on the data and the way in which the data is described (using RDF). So I suspect there will be a need to debug the data. We will probably need answers to questions such as “Is the data correct in the original source (Wikipedia)?“; “Is the data correct in DBpedia?“; “Is the data marked-up in a consistent fashion?“; “Does the query process the data correctly?” and “Does the data reflect the assumptions in the query?“.

Finding an answer to these questions might be best done by looking at the data for the results which were clearly in error and comparing the data with results which appear to be more realistic.

We see that Cambridge has a population of 12 and Oxford a population of 38. These are clearly wrong. My initial suspicion was that several zeros were missing (perhaps the data was described in Wikipedia as population (in tens of thousands).   But looking at the 0ther end of the table, the towns and cities with the largest populations include Chatham (Kent) with a population of 70,540, Stirling (41,243) and Guildford (66,773) – the latter population count agrees with the data held in Wikipedia.

In addition to the strange population figures,there are also questions about the towns and cities which are described as hosting a UK University. As far as I know neither Folkestone nor Hastings has a University. London, however, has many universities but is missing from the list.

My supposition is that the population data is marked up in a variety of ways  – looking at the Wikipedia entry for Cambridge, for example, I see that the info table on the right of the page (which contains the information used in DBpedia) has three population counts: the district and city population (122,800), urban population (130,000), and county population (752,900). But by querying the DBpedia query results I find three values for population: 12, 73 and 752,900.

The confusions regarding towns and cities which may or may not host UK universities might reflect real world complexities – if a town hosts a campus but the  main campus is located elsewhere, should the town  be included? There’s not a clear-cut answer, especially when, as in this case, the data, from Wikipedia, is managed in a very devolved fashion.

I’ve suggested some possible reasons for the incorrect results to the SPARQL query and I am sure there may be additional reasons (and I welcome such suggestions).  How one might go about fixing the  bugs is another question. Should the data be made more consistent?  If so, how might one do this when the data is owned by a distributed query?  Or isn’t the point of Linked Data being that the data should be self-describing – in which case perhaps a much more complex SPAQL query is needed in order to process the complexities hidden behind my apparently simple question.

Posted in Linked Data | 10 Comments »

Response To My Linked Data Challenge

Posted by Brian Kelly on 19 Feb 2010

The Linked Data Challenge

A week ago I issued a challenge to Linked Data developers – using a Linked Data service, such as DBpedia, tell me which town or city in the UK has the largest proportion of students. I’m pleased to say that a developer, Alejandra Garcia Rojas, has responded to my challenge and provided an answer. But this post isn’t about the answer but rather the development processes, the limitations of the approach and the issues which the challenge has raised. The post concludes with a revised challenge for Linked Data (and other) developers.

The Motivation For The Challenge

Before revealing the answer I should explain why I posed this challenge. I can recall Tim Berners-Lee introducing the Semantic Web at a WWW conference  many years ago – looking at my trip reports it was the WWW 7 conference held in Brisbane in 1998. My report links to the slides Tim Berners-Lee used in his presentation in which he encouraged the Web development community to engage with the Semantic Web. I was particularly interested in his slide in which he outlined some of the user problems which the Semantic Web would address:

  • Can Joe access the party photos?
  • Who are all the people who can?
  • Is there a green car for sale for around $15000 in Queensland?
  • Did someone driving a blue car send us an invoice for over $10000?
  • What was the average temperature in 1997 in Brisbane?
  • Please fill in my tax form!

I was interested if 12 years on, such types of questions can be answered using what is now referred to as Linked Data. And as we have a large resource, DBpedia, which provides Linked Data for use by developers I felt it would be useful to use an existing resource (which is based on the structured content held in Wikipedia) to experiment with. I was particularly interested in the following three questions:

  • How easy would it be for an experienced Linked Data developer to write code which would provide an answer? Would it be 10 lines of code which could be written in 10 minutes, a million lines of code which would take a large team years to write or somewhere in between?
  • Is the Linked Data content held in DBpedia of sufficient consistency and quality to allow an answer to be provided within the need for intensive data cleansing?
  • What additional issues might the experiences gained in this challenge raise?

A SPARQL Query To Answer My Challenge

In addition to issuing my challenge on this blog and using Twitter to engage with a wider community I also raised the challenge in the Linked Data Web group in LinkedIn (note you need to be a member of the group to view the discussion). It was in this group that the discussion started, with Kingsley Idehen (CEO at OpenLink Software) clarifying some of the issues I’d raised. Alejandra Garcia Rojas, a Semantic Web Specialist was the developer who immediately responded to my query and, within a few hours, provided an initial summary and a few days later gave me the final version of her SPARQL query (as described in Wikipedia SPARQL is a query language for Linked Data) which was used to provide an answer to my question. Alejandra explained that it should be possible to use the following single SPARQL query to provide an answer from the data held in DBpedia:

prefix dbpedia-owl:
prefix dbpedia-owl-uni:
prefix dbpedia-owl-inst:

select ?town count(?uni) ?pgrad ?ugrad max(?population) (( (?pgrad+?ugrad)/ max(?population))*100) as ?percentage where {
?uni dbpedia-owl-inst:country dbpedia:United_Kingdom ;
dbpedia-owl-uni:postgrad ?pgrad ;
dbpedia-owl-uni:undergrad ?ugrad ;
dbpedia-owl-inst:city ?town.
optional {?town dbpedia-owl:populationTotal ?population . FILTER (?population > 0 ) }
}
group by ?town ?pgrad ?ugrad having( (((?pgrad+?ugrad)/ max(?population) )*100) > 0)
order by desc 6

The Answer To My Challenge

What’s the answer, I hear you asking?  The answer, to my slightly modified query in which I’ve asked for the number of universities and the total population for the six towns with the highest proportion of students, is given in the table below:

City Nos. of
Universities
Student
Nos.
Population Student
Proportion
Cambridge 2 38,696 12 3224%
Leeds 5 135,625 89 1523%
Preston 1 34,863 30 1162%
Oxford 3 40,248 38 1059%
Leicester 3 54,262 62 875%

As can be seen, Cambridge, which has two universities, has the highest proportion of student, with a total student population of 38,696 students and an overall population of 12 people. Clearly something is wrong :-) And as Alejandra has provided a live link to her SPARQL query so you can examine the full responses for yourself. In addition another SPARQL query provides a list of cities and their universities.

The Quality Of The Data Is The Problem

I was very pleased to discover that it was possible to write a brief and simple SPARQL query (anyone with knowledge of SQL will be able to understand the code). The problem lies with the data. And this exercise has been useful in gaining a better understanding of the flaws in the data and of the need to understand why such problems have occurred.

Following discussions with Alejandra we identified the following problems with the underlying data:

  • The population of the towns and cities defined in a variety of ways. We discovered  many different variables describing the population: totalPopulation, urbanPopulation, populationEstimate, etc. – and on occasions there was more than one value in the variables. Moreover, these variables are not always in all cities’ descriptions, thus making it impossible to select the most appropriate value.
  • A full list of all UK universities has not been analysed because the query processes the universities that have the student numbers defined. If the university does not provide the number of students, then it is discarded.
  • Colleges are sometimes defined as universities.

What Have We Learnt?

What have we learnt from this exercise? We have learnt that although the necessary information to answer my query may be held in DBpedia, it is not available in a format which is suitable for automated processing.

I have also learnt that a SPARQL query need not be intimidating and it would appear that writing SPARQL queries need not necessarily be time-consuming, if you have the necessary expertise.

The bad news, though, is that although DBpedia appears to be fairly central to the current publicity surrounding Linked Data, it does not appear to be capable of providing end user services on the basis of this initial experiment.

I do not know, though, what the underlying problems with the data are. It could be due to the complexity of the data modelling, the inherent limitations of the distributed data collection approach used by Wikipedia, limitations of the workflow process in taking data from Wikipedia for use in DBpedia – or simply that the apparent simple query “which place in the UK has the higher per capita student population” does have many implicit assumptions which can’t be handled by the DBpedia’s representation of the data stored in Wikipedia.

If the answer to such apparently simple queries will require much more complex data modelling, there will be a need to address the business models which will be needed to justify additional expenditure needed to handle the complexity. And although there might be valid business reasons for doing this in areas such as biomedial data, it may be questionable whether this is the case for answering essential trivial questions such as the one I posed. In which case the similarly trivial question which Tim Berners-Lee used back in 1998 – “Is there a green car for sale for around $15000 in Queensland?” – was perhaps responsible for misleading people into thinking the Semantic Web was for ordinary end users.  I am now starting to wonder whether a better strategy for those involved in Linked Data activities would be to purposely distance it from typical  end users and target, instead, identified niche areas.

A more general concern which this exercise has alerted me to is the dangers of assuming that the answer to a Linked Data query will necessarily be correct. In this case it was clear that the results were wrong. But what if the results had only been slightly wrong? And what if you weren’t in a position to make a judgement on the validity of the answers?

On the LinkedIn discussion Chris Rusbridge summarised his particular concerns: “My question is really about the application of tools without careful thought on their implications, which seems to me a risk for Linked Data in particular“. Chris went on to ask “what are the risks of running queries against datasets where there are data of unknown provenance and qualification?

My simple query has resulted in me asking many questions which hadn’t occurred to me previously. I welcome comments from others with an interest in Linked Data.

An Updated Challenge For Linked Data (and Other) Developers

It would be a mistake to regard the failure to obtain an answer to my challenge as an indication of limitations of the Linked Data concept – the phrase ‘garbage is, garbage out‘ is as valid in a Linked Data world as it was when it was coined in the days of IBM mainframe computers.’

An updated challenge for Linked Data developers would be to answer the query “What are the top five places in the UK with the highest proportion of students?” The answer should list the town or city, student percentage, together with the numbers of universities, students and the overall population.

And rather than using DBpedia the official source of such data would be a better starting point. The UK government has published entry points to perform SPARQL queries for a variety of statistical queries – so Linked Data developers may wish to use the interface for government statistics and educational statistics.

Of course it might be possible to provide an answer to my query using approaches other than linked data. In a post entitled “Mulling Over =datagovukLookup() in Google Spreadsheets” Tony Hirst asked “do I win if I can work out a solution using stuff from the Guardian Datastore and Google Spreadsheets, or are you only accepting Proper Linked Data solutions“. Although I’m afraid there’s no prize, I would be interested in seeing if an answer to my query can be provided using other approaches.


Twitter conversation from Topsy: [View]

Posted in Linked Data | 35 Comments »

A Challenge To Linked Data Developers

Posted by Brian Kelly on 12 Feb 2010

Back in November, following the interest in Linked Data which had been discussed at a CETIS 2009 Conference I wondered whether it was Time To Experiment With DBpedia?

The following month I attended the Online Information 2009 conference. As I described in a post on the Highlights of Online Information 2009: Semantic Web and Social Web it was clear to me that “ #semanticweb was the highlight & relevant for early mainstream“.  A blog post which provided the LIS Research Coalition “review” of Online 2009 was in agreement: “sessions on the semantic web gave the impression that those in library and information science related roles are now beginning to consider the exploitation of data to data links“.

However a concern I raised with Ian Davis,  CTO of Talis UK following his keynote talk on “The Reality of Linked Data” was the danger of overhyping expectations; something I feel is very relevant in light of the perceived failure of the Semantic Web to live up to the potential of evangelists in the early years of the last decade.  Has, for example, the “new form of Web content that is meaningful to computers will unleash a revolution of new possibilities” described in the Semantic Web article published in Scientific America (and also available from Ryerson.ca) in May 2001 arrived? I think not.

There is a danger, I fear, that the renewed enthusiasm felt by increasing numbers of developers will not be shared by managers and policy makers – leading to interesting pilots and prototypes which do not necessarily become deployed in a mainstream service environment.

A suggestion I made to a number of Linked Data experts at the Online Information 2009 conference was to demonstrate the value of Linked Data not by providing examples in niche subject areas (e.g. chemistry) but by taking an example which everyone can understand.

In my post Time To Experiment With DBpedia? I used the DBpedia Faceted Browser to search for information about UK Universities – in the example I searched for UK Universities which were founded in 1966. But this wasn’t demonstrating how Linked Data can be used to join information which have different underlying structures.

My challenge to Linked Data developers is to make use of the data stored in DBpedia (which is harvested from Wikipedia) to answer the query “Which town or city in the UK has the highest proportion of students?“.  This would involve processing the set of UK Universities, finding all Universities from the same town or city, recording the total number of students  and then, from the town/city entries in DBpedia, finding the total population in order to identify the town or city with the largest proportion of students.

I’m not too concerned about some of the edge cases (i.e. the differences between the City of London and Greater London or the Universities with campuses in several locations).  Rather I want to know:

  • Can Linked Data solve this problem (from a theoretical perspective)?
  • Is DBpedia able to solve this problem (from a theoretical perspective)?
  • How difficult is it to solve the problem (is it a trivial 1 line SPARQL query or would it require several months of work?)

 Any takers?  And note the answer must be provided using DBpedia – asking your friends on Twitter is cheating!

Posted in Linked Data | 11 Comments »

Highlights of Online Information 2009: Semantic Web and Social Web

Posted by Brian Kelly on 4 Dec 2009

Online Information 2009

I summarised my thoughts of the Online Information 2009 conference in a tweet:

Back home after gr8t #online09 Thoughts: #semanticweb was the highlight & relevant for early mainstream; #socialweb now embedded.

This resonated with Andrew Spong who responded:

Best review u’ll c: RT @briankelly: #online09 Thoughts: #semanticweb was highlight & relevant for early mainstream; #socialweb now embedded.

On reflection, however, if I hadn’t been so tired when writing that tweet last night my summary would have been:

Thoughts: #semanticweb was the highlight & relevant for early mainstream; #socialweb now accepted.

Semantic Web: Time for the Early Mainstream Adopters to Engage

The buzz at the conference clearly focussed on the Semantic Web. The conference’s opening keynote was delivered by Dame Wendy Hall and Professor Nigel Shadbolt, both highly regarded researchers at the School of Electronics and Computer Science (ECS) at the University of Southampton whose long standing and influential involvement which dates back to the early days of the Web continues to the present, as can be seen from the recent meeting of Professor Nigel Shadbolt  and Sir Tim Berners-Lee and Gordon Brown, in which “Mr Berners-Lee and Mr Shadbolt presented an update to Cabinet on their work advising the Government on how to make data more accessible to the public“.

The opening plenary provided a high level context to the relevance of the Semantic Web to information professionals. Over the 3 days of the conference the main auditorium featured a series of further talks  focussed on a variety of aspects of the Semantic Web, including thoughts on how the potential of the Semantic Web may be realised, its use in Government, case studies of uses of Semantic Web applications in commerce and the public sector and discussions of standards and metadata.

I’ll not attempt to summarise any of the talks but if you do want to find out more details of the talks and people’s thoughts on the talks I suggest you visit the Online Information 2009 Conference Web site or search for the event’s hashtag: #online09 (note the tweets have also been archived on Twapperkeeper). I’d also welcome links to relevant blog posts to be added as a comment to this post.

Social Web: Now Accepted by the Mainstream

I gave a talk on “Building on Use of Personal Web 2.0 Technologies” at the conference and also chaired the session on “Evaluating, recommending and justifying 2.0 tools“. As I said when I introduced the session, the fact that the Social Web sessions are not being held in the main auditorium is indicative that the Social Web is no longer the exciting new concept which it was a few years ago. But it has also turned out not be be the ‘fad’ which the sceptics predicted; rather it is now widely (but not universally) accepted by many public sector and commercial organisations. The “Social Web: Transforming The Workplace” sessions which, as with the “Semantic Web Coming of Age” sessions ran throughout the conference provided additional advocacy work illustrating how Social Web tools , such as blogs and Twitter, are being incorporated into mainstream working practices and are being shown to provide tangible benefits. The maturity of the discussions about the Social Web could be seen by the willingness to acknowledge limitations (Twitter, for example, may avoid the information overload which email causes, but can bring new problems and concerns). In my talk I mentioned potential risks associated with use of the Social Web, this time focusing on the use of personal tools to support institutional activities – a subject I’ll revisit in another post.

Information Professionals Delivering and Demonstrating Value

The third conference theme was “Information Professionals Delivering and Demonstrating Value“. The title of Mary Ellen Bates’ talk provides a blunter summary of an additional undercurrent to the conference: “Living Large in Lean Times: Adding Value While Cutting Costs“. The question of “how do we engage in such innovation when public sector funding is likely to decline” underpinned the thinking of many delegates form public sector organisations, I suspect, whilst the views of those from the commercial sector was probably summarised by the tweet I spotted which said “How do we monetise the Semantic Web?

Conclusions

I found this year’s conference really useful, with lots of value discussions and chats taking place. As well as gaining an awareness of the importance of how the three conference themes are being perceived by the information professions internationally an additional personal highlight for me was seeing Dr Hazel Hall’s look of astonishment and delight when it was announced that she was the Information Professional of the Year. I met Hazel, the director of the Centre for Social Informatics, Edinburgh Napier University and the executive secretary, Library and Information Science Coalition, on the train from London to Edinburgh a couple of weeks ago, after I tweeted that I was on the train and received a response saying “Me too, shall we meet”. We then had a great chat and the four hour journey to Edinburgh passed very quickly.  A great conference all around, I feel.


Twitter conversation from Topsy: [View]

Posted in Events, Linked Data | Tagged: | 8 Comments »

Time To Experiment With Dbpedia?

Posted by Brian Kelly on 19 Nov 2009

A the recent CETIS 2009 conference I attended a session on “Universities and Colleges in the Giant Global Graph” facilitated by Adam Cooper. There was a feeling that the initial discussions had perhaps focussed too much on detailed technical aspects about Linked Data, and had failed to address the interests of the senior managers present, who were more interested in what Linked Data could do, rather than whether, for example, RDF should be a mandatory requirement of a Linked Data service.

After the coffee break there was a discussion of ways in which Linked Data could be used in an educational context. One suggestion I made was that as DBpedia (an RDF representation of the content of Wikipedia) provides access to a large amount of Linked Data we should be exploring ways in which we can make use of DBpedia to provide examples of what Linked Data can provide. After all if the data is available shouldn’t we be using it to support advocacy work rather than trying to seek funding to create Linked Data resources?

Wikipedia entry for Bath UniversityI was told that DBpedia provides access to structured text boxes in Wikipedia entries, such as the factual entries for Universities (as illustrated).

Could, I wonder, this information be used to demonstrate how such Web pages can be processed as entries in a database rather than just text to be displayed for reading?

So I started experimenting with the DBpedia Faceted Browser.

In the search box I typed “University” and found there were 9,490 entries. After selecting this search option I was then presented with a number of pre-programmed searches such as Country (193 entries for the UK) and City (60 entries for London), I could also search for universities which were established in a particular year (or range).

Searching for universities founded in 1966 I found there were 107 results, including the University of Bath, as shown below.

Can we do more, I wonder, with the RDF data which is already available in DBpedia?

  • Can we use this example to demonstrate the importance of data as opposed to a HTML representation of data designed for viewing?
  • Can we develop of queries which people may find useful?
  • Can we think of data about institutions which could be stored in Wikipedia to allow further queries to be answered?

I also wonder whether it would be possible to go beyond running queries based on the content of the University entries in Wikipedia and explore related pages.

An opportunity for experimentation, perhaps?DBpedia search for Universities established in 1966

Posted in Linked Data | 4 Comments »