UK Web Focus (Brian Kelly)

Innovation and best practices for the Web

Posts Tagged ‘RDFa’

RDFa and WordPress

Posted by Brian Kelly on 5 Apr 2011

RDFa: A Brief Recap

RDFa (Resource Description Framework – in – attributes) is a W3C Recommendation that adds a set of attribute level extensions to XHTML for embedding rich metadata within Web documents.

As described in the Wikidpia entry for RDFa five “principles of interoperable metadata” are met by RDFa:

  1. Publisher Independence: each site can use its own standards
  2. Data Reuse: data is not duplicated. Separate XML and HTML sections are not required for the same content.
  3. Self Containment: the HTML and the RDF are separated
  4. Schema Modularity: the attributes are reusable
  5. Evolvability: additional fields can be added and XML transforms can extract the semantics of the data from an XHTML file

Additionally RDFa may benefit Web accessibility as more information is available to assistive technology.

But how does go about evaluating the potential of RDFa? Last year I wrote a post on Experiments With RDFa which was based on manual inclusion of RDFa markup in a Web page. Although this highlighted a number of issues, including the validity of pages containing RDFa, this is not a scalable approach for significant deployment of RDFa. What is needed is a content management system which can be used to deploy RDFa on existing content in order to evaluate its potential and understand deployment issues.

The Potential for WordPress

WordPress as a Blog Platform and a CMS

WordPress provides a blog platform which can be used for large-scale management of blogs which are hosted at wordpress.com. In addition the software is available under an open source licence and can be deployed within an institution. There is increasing interest in use of WordPress within the higher education sector as can be seen from the recent launch of a WORDPRESS JISCMail list (which is aimed primarily at the UK HE sector) with some further examples of interest in use of WordPress in being available on the University Web Developers group.

A recent discussion on the WORDPRESS JISCMail lists addressed the potential of WordPress as a CMS rather than a blogging platform.  Such uses were also outlined recently in a post on the College Web Editor blog which suggested reasons why WordPress can be the right CMS for #highered websites.  In light of the growing interest in use of WordPress as a CMS it would seem that this platform could have a role to play in the deployment of new HTML developments such as RDFa.

The wp-RDFa WordPress Plugin

A strength of WordPress is its extensible architecture which allows plugins to be developed by third parties and deployed on locally installations of the software.  One such development is the wp-RDFa plugin which supports FOAF and  Dublin Core metadata. The plugin uses Dublin Core markup to tag posts with the title, creator and date elements. In addition wp-RDFa can be configured to make use of FOAF to “relate your personal information to your blog and to relate other users of your blog to you building up a semantic map of your relationships in the online world“.

Initial Experiments With wp-RDFa

Dublin Core Metadata

UKOLN’s Cultural Heritage blog has been closed recently, with no new posts planned for publication.  The blog will however continue to be hosted and can provide a test bed for experiments such as use of the wp-RDFa plugin.

In an initial experiment we found that the although the titles of each blog post were described using Dublin Core metadata, the title was replicated in the blog display. Since this was not acceptable we displayed the use of Dublin Core metadata and repeated the experiment on a private backup copy of the UK Web Focus blog. This time there were no changes in how the blog posts were displayed.

The underlying HTML code made use of the Dublin Core namespace:

<rdf:RDF xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#&#8221; xmlns:dc=”http://purl.org/dc/elements/1.1/”&gt;

with each individual blog post containing the title and publication date provided as RDFa:

<h3 class=”storytitle”>
<span property=“dc:date” content=”2010-04-27 08:17:53″ resource=”http://blogs.ukoln.ac.uk/xxxxx/2010/04/27/workshop-on-engagement-impact-value/&#8221; />
<span rel=”http://blogs.ukoln.ac.uk/xxxxx/2010/04/27/workshop-on-engagement-impact-value/&#8221; property=”dc:title” resource=”http://blogs.ukoln.ac.uk/xxxxx/2010/04/27/workshop-on-engagement-impact-value/”>Workshop on Engagement, Impact, Value</span></a></h3>

It therefore does appear that the plugin can be deployed on local WordPress installations in order to provide richer semantic markup for existing content. I suspect that the problem with the display in the original experiment may may due to an incompatibility with the theme which is being used (Andreas09). I have reported this problem to the developer of the wp-RDFa plugin.

FOAF (Friends-of-a-Friend)

I had not expected an RDFa plugin to provide support for FOAF, the Friends-of-a-Friend vocabulary.  However since my work with FOAF dates back to at least 2004 I had an interest in seeing how it might be used in the context of a blog.

I had expected that information about the blog authors and commenters would be displayed in some way using a RDFa viewer such as the FireFox Operator plugin. However nothing seemed to be displayed using this plugin. In addition use of the RDFa Viewer and the RDFA Developer plugin also failed to detect FOAF markup embedded as RDFa.  I subsequently found that the FOAF information was provided as an external file.  Use of the FOAF Explorer service provides a display of the FOAF information which has been created by the plugin.

What surprised me with the initial display of the FOAF content was the list of names which I did not recognise.  It seems that these are authors and contributors to a variety of other blogs hosted on UKOLN’s WordPress MU (multi-user) server. I wonder whether the plugin was written for a previous version of WordPress, for which there was one blog per installation? In any case a decision has been made to provide access to a FOAF resource which contains details of the blog authors only, as illustrated.

Emerging Issues

A post on Microformats and RDFa deployment across the Web recently surveyed take-up of RDFa based on an analysis of 12 billion web pages indexed by Yahoo! Search and shows that we are seeing a growth in the take-up of semantic markup in Web pages.  As CMS systems (such as Drupal 7 which supports RDfa ‘out of the box’ – link updated in light of comment)  begin to provide RDFa support we might expect to see a sharp growth in Web pages which provide content which can be processed by software as well as being read by humans.  For those institutions which host a local WordPress installation it appears that it is now possible to begin exploring use of RDFa. As described in a post by Mark Birkbeck on RDFa and SEO an important role for RDFa will be to provide improvements to searching.  But in addition the ability to use wp-RDFa to create FOAF files makes we wonder whether this approach might be useful in describing relationships between contributors to blogs and perhaps provide the hooks to facilitate data-mining of the blogosphere.

It would be a mistake, however, to focus on one single tool for creating RDFa markup.  On the WORDPRESS JISCMail list Pat Lockley  mentioned that he is also developing an RDFa plugin for WordPress and invited feedback on further developments.  Here are some of my thoughts:

  • There is a need for a clear understanding of how the semantic markup will be applied and the user cases it aims to address.
  • There will also be a need to understand how such semantic markup would be used in non-blogging uses of WordPress, where the notions of a blog post, blog author and blog commenters may not apply.
  • There will be a need to ensure that different plugins which create RDFa markup are interoperable i.e. if a plugin is replaced by an alternative applications which process the RDFa should give consistent results.
  • Consideration should be given to privacy implications of exposing personal data (in particular) in semantic markup.

Is anyone making use of RDFa in WordPress who has experiences to share?  And are there any further suggestions which can be provided for those who are involved in related development work?

Posted in standards | Tagged: , | 9 Comments »

Experiments With RDFa

Posted by Brian Kelly on 3 May 2010

The Context

In a recent post I outlined some thoughts on Microformats and RDFa: Adding Richer Structure To Your HTML Pages. I suggested that it might now be timely to evaluate the potential of RDFa, but added a note of caution, pointing out that microformats don’t appear to have lived up to their initial hype.

Such reservations were echoed by Owen Stephens who considered using RDFa (with the Bibo ontology) to enable sharing of ‘references’ between students (and staff) as part of his TELSTAR project and went on to describe the reasons behind this decisions. Owen’s decision centred around deployment concerns. In contrast Chris Gutteridge had ideological reservations, as he “hate[s] the mix of visual & data markup. Better to just have blocks of RDF (in N3 for preference) in an element next to the item being talked about, or just in the page“. Like me, Stephen Downes seems to be willing to investigate and asked for “links that would point specifically to an RDFa syntax used to describe events?“. Michael Hausenblas provided links to two useful resources: W3C’s Linked Data Tutorial – on Publishing and consuming linked data with RDFa and a paper on “Building Linked Data For Both Humans and Machines” (PDF format). Pete Johnson also gave some useful comments and provided a link to recently published work on how to use RDFa in HTML 5 resources.

My Experiments

Like Stephen Downes I thought it would be useful to begin by providing richer structure about events. My experiments therefore began by adding RDFa markup for my forthcoming events page.

As the benefits of providing such richer structure for processing by browser extensions appear to be currently unconvincing my focus was in providing such markup by a search engine. The motivation is therefore primarily to provide richer markup for events which will be processed by a widely-used service in order that end users will receive better search results.

My first port of call was a Google post which introduced rich snippets. Google launched their support for Rich Snippets less than a year ago, in May 2009. They are described as “a new presentation of snippets that applies Google’s algorithms to highlight structured data embedded in web pages“.

Documentation on the use of Rich Snippets is provided on Google’s Webmaster Tools Web site. This provides me with information on RDFa (together with microdata and microformats) markup for events. Additional pages provide similar information on markup about people and businesses and organisations.

Although I am aware that Google have been criticised for developed their own vocabulary for their Rich Snippets I was more interested in carrying out a simple experiment with use of RDFa than continuing the debate on the most appropriate vocabularies.

The forthcoming events page was updated to contain RDFa markup about myself (name, organisation and location of my organisation, including the geo-location of the University of Bath.

For my talks in 2010 I replaced the microformats I have used previously with RDFa markup along the providing information on the date of the talks and their location (again with geo-location information).

No changes where noticeable when viewing the page normally. However using FireFox plugins which display RDFa (and microformat) information I can see that software is able to identify the more richly structured elements in the HTML page. The screenshot shows how the markup was rendered by the Operator sidebar and the RDFa Highlight bookmarklet and, in the status bar at the bottom of the screen, links to an RDFa validator and the SIOC RDF Browser.

Rendering of RDFa markup using various FireFox tools.

If you compare this image with the display of how microformats are rendered by the Operator plugin it will be noted that the display of microformats shows the title of the event whereas the display of RDFa lists the HTML elements which contain RDFa markup. The greater flexibility provided by RDFa appears to come at the price of a loss of context which is provided by the more constrained uses provided by microformats.

Is It Valid?

Although the HTML RDFa Highlight bookmarklet demonstrated that RDFa markup was available and indicated the elements to which the markup had been applied, there was also a need to modify other aspects of the HTML page. The DTD was changed from a HTML 1.0 Strict to:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">

If addition the namespace of the RDFa elements needed to be defined:

<html xmlns="http://www.w3.org/1999/xhtml"
  xmlns:cc="http://creativecommons.org/ns#"
  xmlns:v="http://rdf.data-vocabulary.org/#"
  xml:lang="en">

It was possible for me to do this as I have access to the HTML page including elements defined in the HTML . I am aware that some CMS applications may not allow such changes to be made and, in addition, organisations may have policies which prohibit such changes.

On subsequently validating the page I discovered, however, HTML validity errors. It seems that my use of name="foo" attribute has been replaced by id="foo".

The changes to the DTD and the elements and the inclusion of the RDFa markup weren’t the only changes I had to make, however. I discovered that the id="foo attribute requires "foo" to start with an alphabetic character. I therefore had to change id="2010" to id="year-2010". This, for me, was somewhat more worrying as rather than just including new or slightly modified markup which was backwards-compatible, I was now having to change the URL of an internal anchor. If the anchors had started with an alphabetic character this wouldn’t have been an issue (and I would have been unaware of the problem). However it seems that a migration from a document-centred XHTML 1.0 Strict conforming world to the more data-centric XHTML 1.1+RDFa world may result in links becoming broken. I was prepared to make this change on my pages of forthcoming and recent events and change links within the pages. However if others are linking to these internal anchors (which I think is unlikely) then the links with degrade slightly (they won’t result in the display of a 404 error message; instead the top of the page will be displayed, rather than the entries for the start of the particular year).

Google’s View of the RDFa Markup

Using Google’s Rich Snippets Testing Tool it is possible to “enter a web page URL to see how it may appear in search results“. The accompanying image shows the output of this tool for my events page.

Rendering of RDFa markup

This shows the structure of the page which Google knows about. As Google knows the latitude and longitude for the location of the talk it can use this for location based services and it can provide the summary of the event and a description.

Is It Correct?

Following my initial experiment my former colleague Pete Johnston (now of Eduserv) kindly gave me some feedback. He alerted me to W3C’s RDFa Distiller and Parser service – and has recently himself published posts on Document metadata using DC-HTML and using RDFa and RDFa 1.1 drafts available from W3C.

Using the Distiller and Parser service to report on my event page (which has now been updated) I found that I had applied a single v:Event XML element where I should have used three elements for the three events. I had also made a number of other mistakes when I made use of the examples fragments provided in the Google Rich Snippets example without having a sound understanding of the underlying model and how it should be applied. I hope the page is now not only valid but uses a correct data model for my data.

I should add that I am not alone in having created resources containing Linked data errors. A paper on “Weaving the Pedantic Web” (PDF format) presented at the Linked Data on the Web 2010 workshop described an analysis of almost 150,00 URIs which revealed a variety of errors related to accessing and dereferencing resources and processing and parsing the data found. The awareness of such problems has led to the establishment of the Pedantic Web Group which “understand[s] that the standards are complex and it’s hard to get things right” but nevertheless “want[s] you to fix your data“. There will be a similar need to avoid polluting RDFa space with incorrect data.

Is It Worthwhile?

The experiences with microformats would seem to indicate that benefits of use of RDFa will be gained if large scale search engines support its use, rather than providing such information with an expectation that there will be significant usage by client-side extensions.

However the Google Rich Snippets Tips and Tricks Knol page state that “Google does not guarantee that Rich Snippets will show up for search results from a particular site even if structured data is marked up and can be extracted successfully according to the testing tool“.

So, is it worth providing RDFa in your HTML pages? Perhaps if you have a CMS which creates RDFa or you can export existing event information in an automated way it would be worth adding the additional semantic markup. But you need to be aware of the dangers of doing this in order to enhance findability of resources by Google since Google may not process your markup. And, of course, there is no guarantee that Google will continue to support Rich Snippets. On the other hand other vendors, such as Yahoo!, do seem to have an interest in supporting RDFa – so potentially RDFa could provide a competitive advantage over other search engine providers.

But, as I discovered, it is easy to make mistakes when using RDFa. So there will be essential to have an automated process for the production of pages containing RDFa – and there will be a need to ensure that the data model is correct as well as the page being valid. This will require a new set of skills as such issues are not really relevant in standard HTML markup.

I wonder if I have convinced Owen Stephens and Chris Gutteridge who expressed their reservations about use of RDFa? And are there any examples of successful use of RDFa which people know about?

“RDFa from Theory to Practice” Workshop Session

Note that if you have an interest in applying the potential of RDFa in practice my colleagues Adrian Stevenson, Mark Dewey and Thom Bunting will be running a 90 minute workshop session on “RDFa from theory to practice” at this year’s IWMW 2010 event to be held at the University of Sheffield on 12-14 July.

Posted in HTML, W3C | Tagged: | 5 Comments »

Microformats and RDFa: Adding Richer Structure To Your HTML Pages

Posted by Brian Kelly on 25 Mar 2010

Revisiting Microformats

If you visit my presentations page you will see a HTML listing of the various talks I’ve given since I started working at UKOLN in 1996.  The image shown below gives a slightly different display from the one you will see, with use of a number of FireFox plugins providing additional ways of viewing and processing this information.

Firefox extensions

This page contains microformat information about the events.  It was at UKOLN’s IWMW 2006 event that we made use of microformats on the event Web site for the first time with microformats being used to mark up the HTML representation for the speakers and workshop facilitators together with the timings for the various sessions. At the event Phil Wilson ran a session on “Exposing yourself on the Web with Microformats!“. There was much interest in the potential of microformats back in 2006, which was then the hot new idea.  Since then I have continued to use microformats to provide richer structural information for my events and talks. I’ll now provide a summary of the ways in which the microformats can be used, based on the image shown above.

The Operator sidebar (labelled A in the image) shows the Operator FireFox plugin which “leverages microformats and other semantic data that are already available on many web pages to provide new ways to interact with web services“. The plugin detects various microformats embedded in a Web page and supports various actions – as illustrated, for events the date, time and location and summary of the event can be added to various services such as Google and Yahoo! Calendar.

The RDFa in Javascript bookmarklets (labelled B) are simple JavaScript tools which can be added to a variety of different browsers (they have been tested on IE 7,  Firefox, Safari, Mozilla and Safari). The License bookmarklets will create a pop-up alert showing the licence conditions for a page, where this has been provided in a structured form. UKOLN’s Cultural Heritage briefing documents are available under a Creative Commons licence. Looking at, for example, the Introduction to Microformats briefing document, you will see details of the licence conditions displayed for reading. However, in addition, a machine-readable summary of the licence conditions is also available which is processed by the Licence bookmarklet and displayed as a pop-up alert. This information is provided by using the following HTML markup:

<p><a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/2.0/">
<img src="http://creativecommons.org/images/public/somerights20.gif"
   alt="Creative Commons License" /></a>This work is licensed under a
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/2.0/">Creative Commons
License</a>.</p>

The power is in the rel=”license” attribute which assigns ‘meaning’ to the hypertext link.

The link to my Google Calendar for each of the events (labelled C) is provided by the Google hCalendar Greasemonkey script. Clicking on the Google Calendar icon (which is embedded in the Web page if hCalendar microformatting markup is detected – although I disable this feature if necessary) will allow the details to be added to my Google Calendar without me having to copy and paste the information.

The additional icons in the browser status bar (labelled D) appear to be intended for debugging of RDFa – and I haven’t yet found a use for them.

The floating RSS Panel (labelled E) is another GrreaseMonkey script. In this case the panel does not process microformats or RDFa but autodetectable links to RSS feeds. I’m mentioning it in this blog post in order to provide another example of how richer structure in HTML pages can provide benefits to an end user. In this case in provides a floating panel in which RSS content can be displayed.

RDFa – Beyond Microformats

The approaches I’ve described above date back to 2006, when microformats was the hot new idea.  But now there is more interests in technologies such as Linked Data and RDF. Those responsible for managing Web sites with an interest in emerging new ways of enhancing HTML pages are likely to have an interest in RDFa: a means of including RDF in HTML resources.

The RDFa Primer is sub-titled “Bridging the Human and Data Webs“. This sums up nicely what RDFa tries to achieve – it enables Web editors to provide HTML resources for viewing by humans whilst simultaneously providing access to structured data for processing by software.  Microformats provided an initial attempt at doing this, as I’ve shown above.  RDFa is positioning as providing similar functionality, but coexisting with developments in the Linked Data area.

The RDFa Primer provides some examples which illustrate a number of use cases.  My interest is in seeing ways in which RDFa might be used to support Web sites I am involved in building, including this year’s IWMW 2010 Web site.

The first example provided in the primer describes how RDFa can be used to describe how a Creative Commons licence can be applied to a Web page; an approach which I have described previously.

The primer goes on to describe how to provided structured and machine understandable contact information, this time using the FOAF (Friends of a Friend) vocabulary:

<div typeof="foaf:Person" xmlns:foaf="http://xmlns.com/foaf/0.1/">
   <p property="foaf:name">Alice Birpemswick</p>
   <p>Email: <a rel="foaf:mbox" href="mailto:alice@example.com">alice@example.com</a></p>
   <p>Phone: <a rel="foaf:phone" href="tel:+1-617-555-7332">+1 617.555.7332</a></p>
</div>

In previous year’s we have marked up contact information for the IWMW event’s program committee using hCard microformats. We might be in a position now to use RDFa. If we followed the example in the primer we might use RDFa to provide information about the friends of the organisers:

<div xmlns:foaf="http://xmlns.com/foaf/0.1/"> <ul> <li typeof="foaf:Person"> <a rel="foaf:homepage" href="http://example.com/bob/">Bob</a> </li> <li typeof="foaf:Person"> <a rel="foaf:homepage" href="http://example.com/eve/">Eve</a> </li> <li typeof="foaf:Person"> <a rel="foaf:homepage" href="http://example.com/menu/">Menu</a> </li> </ul></div>

However this would not be appropriate for an event. What would be useful would be to provide information on the host information for the speakers and workshop facilitators. In previous year’s such information has been provided in HTML, with no formal structure which would allow automated tools to process such institutional information.  If  RDFa was used to provide such information for the 13 years since the event was first launched this could allow an automated tool to process the event Web sites and provide various report on the affiliations of the speakers. We might be then have a mechanism for answering the query “Which institution has provided the highest number of (different) speakers or facilitators at IWMW events?“. I can remember that Phil Wilson, Andrew Male and Alison Kerwin (nee Wildish) from the University of bath have spoken at events, but who else? And what about the Universities which I am unfamiliar with?   This query could be solved if the data was stored in a backend database, but as the information is publicly available on the Web site, might not using slightly more structured content on the Web site be a better approach?

Really?

When we first started making use of microformats I envisaged that significant numbers of users would be using various tools on the browser to process such information.  However I don’t think this is the case (and I would like to hear from anybody who does make regular use of such tools).   I have to admit that although I have been providing microformats for my event information, I have not consumed microformats provided by others (and this includes the microformats provided on the events page on the JISC Web site).

This isn’t, however, necessarily an argument that microformats – or RDFa –  might not be useful.  It  may be that the prime use of such information is by server-side tools which harvest such information form a variety of sources. In May 2009, for example, Google announced that Google Search Now Supports Microformats and Adds “Rich Snippets” to Search Results. Yah0o’s SearchMonkey service also claims to support structured search queries.

But before investing time and energy into using RDFa across an event Web site the Web manager will need answers to the questions:

  • What benefits can this provide?  I’ve given one use case, but I’d be interested in hearing more.
  • What vocabularies do we need to use and how should the data be described? The RDFa Primer provides some example, but I am unsure as to how to use RDFa to state that, for example, Brian Kelly is based at the University of Bath, to enable structured searches of all speakers from the University of Bath.
  • What tools are available which can process the RDFa which we may chose to create?

Anyone have answers to these questions?

Posted in HTML, W3C | Tagged: , | 11 Comments »