UK Web Focus

Innovation and best practices for the Web

Archive for the ‘HTML’ Category

Request for Proposals For HTML5 Case Studies

Posted by Brian Kelly (UK Web Focus) on 5 July 2011

UKOLN has announced a Request for Proposals (RfP) For HTML5 Case Studies.  The proposals for HTML5 case studies and demonstrators should describe best practices and scenarios for making use of HTML5 and related Open Web Platform standards in areas of relevance to those working in the higher and further education sectors.

The proposals should address new features of the emerging HTML5 standard (e.g. canvas; geo-location; local storage; video; form fill; etc.) or related standards which form part of the W3C’s Open Web Platform such as the CSS, DOM, MathML, etc.

Application areas might include, but are not restricted to, benefits to institutional Web site (e.g. SEO benefits or enriched functionality); teaching and learning applications (course lectures delivered via video, audio, etc.; lab notebooks); research applications (e.g. articles, series, journals; books; table of contents; bibliography; citation); multi-channel access; etc.

The proposals should describe how the work was implemented and the ways in which the new functionality was (or could be) implemented in a real-world context of legacy browsers; possible lack of development tools; etc.

Case studies must be made available under a Creative Commons licence and if accompanying code is provided this should be made available under an appropriate Open Source licence.

A sum of £5,000 is available for each accepted submission. The deadline for submissions is Monday 18 July 2011. Accepted proposals must agree to provide final case studies by 16 September 2011.

Further information is available on the UKOLN Web site.

Posted in HTML | Tagged: | 1 Comment »

Schema.org, Google +1 and Facebook Like and Send

Posted by Brian Kelly (UK Web Focus) on 3 June 2011

Schema.org

Yesterday I came across a stream of tweets about schema.org. A post on the official Google blog  was the initial post I saw: Introducing schema.org: Search engines come together for a richer web. This was followed by Yahoo’s post “Introducing schema.org: A Collaboration on Structured Data“. Not to be left out on the Bing blog Microsoft provided a similar post “Introducing Schema.org: Bing, Google and Yahoo Unite to Build the Web of Objects“.

But what is schema.org?  The home page summarises what this collaborative approach to structured data s about:

This site provides a collection of schemas, i.e., html tags, that webmasters can use to markup their pages in ways recognized by major search providers. Search engines including Bing, Google and Yahoo! rely on this markup to improve the display of search results, making it easier for people to find the right web pages.

Many sites are generated from structured data, which is often stored in databases. When this data is formatted into HTML, it becomes very difficult to recover the original structured data. Many applications, especially search engines, can benefit greatly from direct access to this structured data. On-page markup enables search engines to understand the information on web pages and provide richer search results in order to make it easier for users to find relevant information on the web. Markup can also enable new tools and applications that make use of the structure.

A shared markup vocabulary makes easier for webmasters to decide on a markup schema and get the maximum benefit for their efforts. So, in the spirit of sitemaps.org, Bing, Google and Yahoo! have come together to provide a shared collection of schemas that webmasters can use.

The endorsement for deployment of structured semantic HTML markup from the main search engine vendors would appear to suggest that this approach will succeed whereas initial approaches, such as microformats, did not live up to their promise.  The question will be how quickly CMS vendors respond or whether in-house development will provide an opportunity for rapid deployment of the schema.org vocabularies.  Of course there will also be a need for organisations to monitor the benefits in order to ensure that this provides additional benefits over other SEO techniques.

Google +1

We have seen other developments this week, in particular the promotion of the Google+1 recommendation system.  This was initially highlighted in a post on which announced that Social Search goes global in a post published on the Google Social Web blog in April with the announcement that “Social Search is rolling out globally in 19 languages and should be available in the coming week“. A few ago more information was revealed on how Google +1 can be used to:

publicly show what you like, agree with, or recommend on the web. The +1 button can appear in a variety of places, both on Google and on sites across the web. For example, you might see a +1 button for a Google search result, Google ad, or next to an article you’re reading on your favorite news site. Your +1’s and your social connections also help improve the content you see in Google Search.

The Google Webmaster Central Blog goes on to add that:

We’re working on a +1 button that you can put on your pages too, making it easy for people to recommend your content on Google search without leaving your site.

Hmm, so as well as providing semantic markup from schema.org it should also be possible to use Google’s social networks to provide recommendations.

Facebook Like and Send

As described on the Read Write Blog just over a month ago Facebook announced that in addition to the ‘Like’ button which enables recommendations to be made available within the Facebook environment a Send button can be used so that “Facebook users will be able to share content with specific groups of friends, rather than everyone on their friends list, giving them the precision sharing tool they’ve needed all along“.

Discussion

google is the most powerful standardization organization. A dictatorial one for that matter

Discussions on Twitter on the implications of the schema.org announcements have already started with @hvdsompel suggesting that “schema.org is yet another illustration that google is the most powerful standardization organization. A dictatorial one for that matter“.

My view is that microformats provided an opportunity for a community-led inititiative which dates back to 2005. As I described in a trip report about the WWW 2005 conference published in Ariadne in July 2005:

I should mention ‘microformats’ or, as it is also (confusingly) termed the ‘lowercase semantic web’. Microformats have been described as ‘a set of simple, open data formats built upon existing and widely adopted standards … designed for humans first and machines second

The following year IWMW 2006 featured a workshop session on “Exposing yourself on the Web with Microformats!” which explored ways in which richer semantics for content held on institutional Web sites could be exposed.  But in reality, apart from a few Firefox plugins such as Operator and Tails Export, there was little ongoing interest in microformats.

This is a reasons why I find the schemas.org announcement particularly exciting – and the coordinated posts from Yahoo and Microsoft in addition to Google will help to address concerns that this is an attempt by Google to enforce a company-specific solution on the Web.

But where does this leave RDFa?  As explained in a page about the schemas.org data model: “Our use of Microdata maps easily into RDFa 1.1. In fact, all of Schema.org can be used with the RDFa 1.1 syntax as is” although the page concludes with the warning that “Microdata does not have analogs for RDFa features such as CURIES, Turtles, Chaining, Typed vs Plain literals, etc.“.

“Why the Button War? Because Content is Social Currency”

The discussions and arguments about the underlying structure of semantic markup on HTML resources will focus on the different approaches (microformats, RDFa and the schemas.org support for the microdata approach) and, as we have seen, the issues about ownership of the approaches.  It should be pointed out, however, that microdata is part of HTML5 and the HTML5 Microdata draft was published on the W3C Web site on Wednesday 1 June 2011. So whilst the editor of the draft is based at Google it should be acknowedged that many W3C standards have been developed by those working in commercial companies and that this can help to ensure that standards are deployed and gain acceptance in the marketplace.

But in addition to the discussions about approaches to exploiting microdata consideration will also have to be given to ways in which content can be exposed to the social web in order for individuals to share their recommendations across theirs networks.  We have already seen the how the popularity of the ‘Like’ button has led to the release of the ‘Send’ button, which is based on Facebook’s Open Graph Protocol, which has been described as “A Meaningful First Step to a True Semantic Web”.  But what of Google +1? And what of the user experience with a seemingly ever-growing array of buttons which can be used to promote Web resources across social web environments including not only Facebook and Google but also microblogging environments such as Twitter?

In a post entitled “Why the Button War? Because Content is Social Currency [10 Links]” Tac Anderson describes how buttons (whether a Facebook ‘Like’, a Twitter Retweet, the rating button at the bottom of this post or whatever) provide a  ‘point of transaction’ where one says “Yes I associate myself with this piece of content” .  Tac argues that “As a publisher buttons are invaluable as a way to make you content shareable, raise awareness, drive engagement and ultimately increase visits and regular readers“.

Whilst the focus of the post seems to be the more general commercial and social uses of the Web, the need to make “content shareable, raise awareness, drive engagement and ultimately increase visits and regular readers” is also true for those of us working across the higher education sector, whether in teaching and learning, research or marketing areas.

I can’t help that feel that resource discovery is getting very interesting – and I’d welcome comments on how we feel that our sector should respond.

Posted in HTML | 3 Comments »

New HTML5 Drafts and Other W3C Developments

Posted by Brian Kelly (UK Web Focus) on 13 April 2011

 

New HTML5 Drafts

The W3C’s HTML Working Group has recently announced the publication of eight documents:

Last Call Working Drafts for RDFa Core 1.1 and XHTML+RDFa 1.1

Back in August 2010 in a post entitled New W3C Document Standards for XHTML and RDFa I described the latest release of RDFa Core 1.1 and XHTML+RDFa1.1 draft documents. The RDFa Working Group has now published Last Call Working Drafts of these documents: RDFa Core 1.1 and XHTML+RDFa 1.1.

New Provenance Working Group

The W3C has also recently launched a new Provenance Working Group whose mission is “to support the widespread publication and use of provenance information of Web documents, data, and resources“. The Working Group will publish W3C Recommendations that define a language for exchanging provenance information among applications. This is an area of work which is likely to be of interest to those involved in digital library development work – and it is interesting to see that a workshop on Understanding Provenance and Linked Open Data was held recently at the University of Edinburgh.

Emotion Markup Language

When I first read of the Multimodal Interaction (MMI) Working Group‘s announcement of the Last Call Working Draft of Emotion Markup Language (EmotionML) 1.0. I checked to see that it hadn’t been published on 1 April! It seems that “As the web is becoming ubiquitous, interactive, and multimodal, technology needs to deal increasingly with human factors, including emotions“.

The EmotionML Language allows various vocabularies to be used such as:

The six terms proposed by Paul Ekman (Ekman, 1972, p. 251-252) as basic emotions with universal facial expressions — emotions that are recognized and produced in all human cultures: anger; disgust; fear; happiness; sadness and surprise.

The 17 terms found in a study by Cowie et al (Cowie et al., 1999) who investigated emotions that frequently occur in everyday life: affectionate; afraid; amused; angry; bored; confident; content; disappointed; excited; happy; interested; loving; pleased; relaxed; sad; satisfied and
worried.

Mehrabian proposal of a three-dimensional description of emotion in terms of Pleasure, Arousal, and Dominance.

Posted in HTML, standards, W3C | 1 Comment »

The HTML5 Standardisation Journey Won’t Be Easy

Posted by Brian Kelly (UK Web Focus) on 3 February 2011

I recently published a post on Further HTML5 Developments in which I described how the W3C were being supportive of approaches to the promotion of HTML5 and the Open Web Platform. However in a post entitled  HTML is the new HTML5 published on 19th January 2011 on the WhatWG blog Ian Hickson, editor of the HTML5 specification (and graduate of the University of Bath who now works for Google) announced that “The HTML specification will henceforth just be known as ‘HTML’”. As described in the FAQ it is intended that HTML5 will be a “living standard:

… standards that are continuously updated as they receive feedback, either from Web designers, browser vendors, tool vendors, or indeed any other interested party. It also means that new features get added to them over time, at a rate intended to keep the specifications a little ahead of the implementations but not so far ahead that the implementations give up.

What this means for the HTML5 marketing activities is unclear. But, perhaps more worrying is what this will mean for the formal standardisation process which W3C has been involved in.  Since it seems that new HTML(5) features can be implemented by browser and tool vendors this seems to herald a return to the days of the browser wars, during which Netscape and Microsoft introduced ‘innovative’ features such as the BLINK and MARQEE tags.

On the W3C’s public-html list Joshue O Connor (a member of the W3C WAI Protocol and Formats Working Group) feels that:

What this move effectively means is that HTML (5) will be implemented in a piecemeal manner, with vendors (browser manufacturers/AT makers etc) cherry picking the parts that they want. … This current move by the WHATWG, will mean that discussions that have been going on about how best to implement accessibility features in HTML 5 could well become redundant, or unfinished or maybe never even implemented at all.

In response Anne van Kesteren of Opera points out that:

Browsers have always implemented standards piecemeal because implementing them completely is simply not doable. I do not think that accepting reality will actually change reality though. That would be kind of weird. We still want to implement the features.

and goes on to add:

Specifications have been in flux forever. The WHATWG HTML standard since 2004. This has not stopped browsers implementing features from it. E.g. Opera shipped Web Forms 2.0 before it was ready and has since made major changes to it. Gecko experimented with storage APIs before they were ready, etc. Specifications do not influence such decisions.

Just over a year ago a CETIS meeting on The Future of Interoperability and Standards in Education explored “the role of informal specification communities in rapidly developing, implementing and testing specifications in an open process before submission to more formal, possibly closed, standards bodies“. But while the value of rapid development, implementation and testing was felt to be valuable there was a recognition of the continued need for the more formal standardisation process.  Perhaps the importance of rapid development which was highlighted at the CETIS event has been demonstrated by the developments centred around HTML5, with the W3C providing snapshots once the implementation and testing of new HTML developments have taken place, but I feel uneasy at the developments. This unease has much to do with the apparent autonomy of browser vendors: I have mentioned comments from employees of Google and Opera who seem to be endorsing this move (how would we feel if it was Microsoft which was challenging the W3C’s  standardisation process?). But perhaps we should accept that significant Web developments are no longer being driven by a standards organisation or from grass-roots developments but from the major global players in the market-place? Doesn’t sound good, does it – a twenty-first century return to browser vendors introducing updated versions of BLINK and MARQUEE elements as they’ll know what users want :-(

Posted in HTML, standards, W3C | Tagged: | 3 Comments »

Further HTML5 Developments

Posted by Brian Kelly (UK Web Focus) on 25 January 2011

Updated HTML5 Documents

Back in November 2010 in a post entitled Eight Updated HTML5 Drafts and the ‘Open Web Platform’ I described how the W3C had published draft versions of eight documents related to HTML5.  It seems that W3C staff and members of various HTML5 working groups have been busy over Christmas as the HTML Working Group has published further revised versions of eight documents:

HTML5 Marketing Activities

HTML5 LogoThe significance of the development work to HTML5 specifications and the importance which W3C is giving to HTML5 can be seen from the announcement that “W3C Introduces an HTML5 Logo” which describes this “striking visual identity for the open web platform“.

The page about the logo is full of marketing rhetoric:

Imagination, meet implementation. HTML5 is the cornerstone of the W3C’s open web platform; a framework designed to support innovation and foster the full potential the web has to offer. Heralding this revolutionary collection of tools and standards, the HTML5 identity system provides the visual vocabulary to clearly classify and communicate our collective efforts.

The W3C have also pointed out how the logo is being included on t-shirts, which you can buy for $22.50.   The marketing activity continues with encouragement for HTML5 developers to engage in viral marketing:

Tweet your HTML5 logo sightings with the hashtag#html5logo

In addition to Web sites owners being able to use this logo on their Web sites and fans of HTML5 being able to wear a T-shirt (“wearware”?) as I learnt from Bruce Lawson’s post on “On The HTML5 Logo”  users of FireFox and Opera browsers can install a Greasemonkey Script or Opera extension which will display a small HTML5 logo in the top right hand corner of the window of HTML5 pages. I’ve tried this and it works.

Such marketing activities are unpopular in some circles with much of the criticismcentered around the FAQ’s original statement that the logo means “a broad set of open web technologies”, which some believe “muddies the waters” of the open web platform“.  In light of such concerns the W3C have updated the HTML5 Logo FAQ.

I have to say that personally I applaud this initiative.  In the past the commercial sector has taken a lead in popularising Web developments as we saw in the success of the Web 2.0 meme – it’s good, I feel, that the W3C are taking a high profile in the marketing of HTML5 developments. I also feel that this is indicative of the importance of HTML5, which, judging from examples of HTML5′s potential which I have described in a number of recent posts, will be of more significance than the moves from HTML 3.2 to HTML 4 and HTML 4 to XHTML 1.

Spotting HTML5 Pages – Including the Google Home Page

Use of the Opera extension which embeds a small version of the HTML5 icon in the top right hand corner of the browser display is shown (click to see full-size version).

Whilst searching for a HTML5 Web site to use for this example I discovered that the Google search page now uses HTML, with the following HTML5 declaration included at the top of the page:

<doctype html>

I had previously thought that Google was very conservative in its use of HTML as, in light of its popularity, the page had to work of a huge range of browsers. Note, though, that on using W3C’s HTML validator, which includes experimental support for HTML5, I found that there were  still HTML errors, many of which were due to unescaped ‘&’ characters.  Some time ago it was suggested that the reason Google wasn’t implementing the simple changes in order to ensure that their home page validated was in order to minimise the bandwidth usage – which will be very important for globally popular site such as Google’s which, despite losing the top slot to Facebook in the US last  year, is still pretty popular :-). Hmm, if there are around 90 million Google users per day I wonder how much bandwidth is saved by using & rather than & in its home page and search results?

Posted in HTML, standards | Tagged: | Leave a Comment »

HTML5: Are Museum Web Sites Ahead of HE?

Posted by Brian Kelly (UK Web Focus) on 30 December 2010

Martin Hawksey, a prolific blogger on the RSC Scotland North and East blog, recently alerted me to an article published in the ReadWriteWeb blog which describes how Scotland Trailblazes the Use of HTML5 in Museums. The trailblazing Scottish institution wasn’t a University or a Web development or Web design company – rather it was the National Museums Scotland Web site.

The article describes how:

The National Museums of Scotland have become the first major museum organization in the world to fully implement HTML5.

and goes on to inform readers that

Museum digital media tech manager Simon Madine explained in a blog post that the implementation across the five allied sites was married to an overall redesign. That redesign saw the site gain color and shoulder-room and emphasize more visuals. But the implementation of HTML5 is more revolutionary. It allows a greater level of search engine accessibility, easier rendering across browsers and overall makes it easier to elegantly add and change site content.

According to Hugh Wallace, NMS head of digital media “The site should be eminently more findable too as it’s structured for the way Google reads pages“.  In fact, the only other museum that Wallace’s crew could find that has fully implemented the language is The American Sport Art Museum and Archives.

My question for those involved in providing institutional Web sites is “Are you making use of HTML5?”. If you are, I’d be interested in hearing how you are going about doing this and what benefits you have identified that this can provide. And if not, why have you chosen not to do so?  I’d also be interested to receive responses from those working in other sectors and other countries

Posted in HTML | 3 Comments »

“HTML5: If You Bang Your Head Against The Keyboard You’ll Create a Valid Document!”

Posted by Brian Kelly (UK Web Focus) on 10 December 2010

“HTML5 / CSS3 / JS  – a world of new possibilities”

I recently attended the 18th Bathcamp event entitled “Faster, cheaper, better!“.  For me the highlight of the evening was a talk by Elliott Kember (@elliottkember)  on “HTML5 / CSS3 / JS  – a world of new possibilities“.

The Elliottkember.com Web site describes Elliot as:

freelance web developer based in Bath, England
who builds and maintains high-traffic, powerful web apps,
resorts to using 32pt Georgia – sometimes in italic and printer’s primaries,
has 4978 followers on Twitter, speaks at conferences,
and wants to develop your idea into an application.

Elliott gave a fascinating run through some of the new presentational aspects of HTML5 and CSSS, appropriately using a HTML5 document to give the presentation.  His slides are available at http://riothtml5slides.heroku.com/ and are well worth viewing. Note that to progress through the slides you should use the forward and back arrows – and not that Elliott was experimenting with some of the innovative aspects of HTML5 and CSS3 so the presentation might not work on all browsers.

In this post I’ll not comment on the HTML5 features which Elliott described. Rather than looking at the additional features I’ll consider the implications of the ways in which the HTML5 specification is being simplified.

HTML5′s Simplicity

Elliot introduced the changes to HTML5′s by pointing out its simplicity. For example a HTML 4 document required the following Doctype definition:

<!--DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">-->

whereas HTML5 simply requires:

<!--doctype html>-->

The following illustrates a valid HTML5 document:

<!--DOCTYPE html>
Small HTML 5

Hello world
-->

As can be seen there is no requirement to include the <head> and <body> elements which are needed in order for a HTML 4 document to be valid (although HTML 4 documents which do not include these mandatory elements will be rendered correctly by Web browsers.

What about IE?

Over the years developments to HTML standards have always given rise to the question “What about legacy browsers?“. Often the answer has been “The benefits of the new standard will be self-evident and provide sufficient motivation for organisations to deploy more modern browsers“.  Whether the benefits of the developments from, say, HTML 3.2 to HTML 4 and HTML 4 to XHTML 1 have provided sufficient motivation for organisations to invest time and effort in upgrading their browers is, however, questionable – I know I have been to institutions which are still providing very dated versions of browsers on their public PCs.   And whether the HTML technology previews which tend to be demonstrated when a new version of HTML is released will be typical of the mainstream uses may also be questioned.  So there is still a question about the deployment of services based on HTML5 in an environment of flawed browsers, which includes Internet Explorer; it should also be noted that other browsers may also have limited support for new HTML5 (and CSS 3) features.

Elliott suggests that a solution to the “What about IE?” question may be provided by a HTML5 ‘shim’. A shim (which is also sometimes referred to as a ‘shiv’) is described in Wikipedia as “a small library which transparently intercepts an API, changes the parameters passed, handles the operation itself, or redirects the operation elsewhere“.

Remy Sharp has developed what he calls the HTML5 shiv, which consists of the following three lines:

<mce:script 
// -->

This code provides a mechanism for IE to recognose new elements, such as, as Elliott uses in his presentation, <slide>

Use it Now?

Should you start using HTML5 now?  Back in July in his plenary talk on “HTML5 (and friends): The future of web technologies – today” given at the IWMW 2010 event Patrick Lauke suggested that for new Web development work it would be appropriate to consider using HTML5.

Elliott was in agreement, with his slides  making the point that:

All decent browsers support enough of this stuff to make it worth using.

What this means is that you can start to make use of the simple HTML5 declaration but rather than use every HTML5 feature that is documented in the specification you should check the level of support for various features using, for example the Periodic Table of HTML5 Elements and the HTML5 Test web site and Wikipedia’s Comparison of layout engines (HTML5) as well as running standard usability checks on an appropriate range of browsers and platforms.

What About Validity of HTML5?

Following Elliott’s talk there was a question about the validity of HTML5 documents.  Elliott responding with a very graphic depiction of the much more liberal (if one dare uses that word!) approach to validity: “If you bang your head against the keyboard you’ll probably create a valid HTML5 document!“.

Such an approach is based on observing how few Web resources actually conform with existing HTML specifications.  In many cases browser rendering is being used as an acceptable test for conformity – if a Web page is displayed and is usable in popular Web browsers then it is good enough seems to be the situation today.  “After all” asked Elliott “how many people validate their Web pages today?” The small numbers of hands which were raised (including myself and Cameron Neylon) perhaps supported this view and when the follow-up question “Who bothers about using closing tags on <br> elements in XHTML documents these days?” was asked I think mine was the only hand which was raised.

The evidence clearly demonstrates that strict HTML validity, which was formally required in XHTML, has been rejected in the Web environment. In future, it would seem, there won’t be a need to bother about escaping &s and closing empty tags, although if Web authors wish to continue with such practices they can do so.

What About Complex Documents?

Such simplicity seemed to be welcomed by many who attended the Bathcamp meeting.  But myself and Cameron Neylon, an open science researcher based at the Science and Technology Facilities Council, still had some concerns.  What will the implications be if a HTML resource is being used not just for display and user interaction, but as a container for structured information?  How will automated tools process embedded information provided as RDFa or microdata if the look-and-feel and usability of a resource is the main mechanism for validation of the internal consistency of a resource?

And what if an HTML5 document is used as a container for other structured elements, such as mathematical formulae provided using MathML; chemcial formula provided using CML;  etc.?

There are dangers that endorsing current lax approaches to HTML validity can hinder the development of more sophisticated uses of HTML, especially in the research community. We are currently seeing researchers arguing that the main document format for use in scientific and research papers should move away from PDF to a more open and reusable format. HTML5 has been suggested as a possible solution? But will this require more rigourous use of the HTML5 specification?  And if the market place chooses to deploy tools which fail to implement such approaches, will this act as a barrier to deployment of HTML5 as a rich and interoperable format for the community?

Posted in HTML, standards | 4 Comments »

HTML and RDFa Analysis of Welsh University Home Pages

Posted by Brian Kelly (UK Web Focus) on 17 November 2010

Surveying Communities

A year ago I published a survey of RSS Feeds For Welsh University Web Sites which reported on auto-discoverable RSS feeds available on the home page of 12 Welsh Universities.  This survey was carried out over a small community in order to identify patterns and best practices for the provision of RSS feeds which could inform discussions across the wider community.

Trends in Use of HTML and RDFa

As described in previous analysis of usage of RSS feeds on Scottish University home pages such surveys can help to understand the extent to which emerging new standards and best practices are being deployed within the sector and, if usage is low, in understanding the reasons and exploring ways in which barriers can be addressed.

With the growing interest in HTML5 and RDFa it will be useful to explore whether such formats are being used on institutional home pages.

An initial small-scale survey across Welsh University home pages has been carried out in order to provide some initial findings which can be used to inform discussions and further work in this area.

The Findings

The findings, based on a survey carried out on 21 October 2010, are given in the following table. Note that the HTML analysis was carried out using the W3C HTML validator. The RDFa analysis was carried out using Google’s Rich Snippets testing tool since it is felt that the benefits for searching which use of RDFa is felt to provide will be exploited initially to enhance the visibility of structured information to Google.

Institution Analysis Findings
1 Aberystwyth University HTML Analysis XHTML 1.0 Transitional
RDFa Analysis None found
2 Bangor University HTML Analysis XHTML 1.0 Transitional (with errors)
RDFa Analysis None found
3 Cardiff University HTML Analysis XHTML 1.0 Strict (with errors)
RDFa Analysis None found
4 Glamorgan University HTML Analysis HTML5 (with errors)
RDFa Analysis None found
5 Glyndŵr University HTML Analysis XHTML 1.0 Transitional (with errors)
RDFa Analysis None found
6 Royal Welsh College of Music & Drama HTML Analysis XHTML 1.0 Strict (with errors)
RDFa Analysis None found
7 Swansea University HTML Analysis XHTML 1.0 Transitional
RDFa Analysis None found
8 Swansea Metropolitan University HTML Analysis XHTML 1.0 Transitional (with errors)
RDFa Analysis None found
9 Trinity University College HTML Analysis XHTML 1.0 Strict (with errors)
RDFa Analysis None found
10 University of Wales Institute, Cardiff HTML Analysis XHTML 1.0 Strict (with errors)
RDFa Analysis None found
11 University of Wales, Newport HTML Analysis HTML 4.01 Transitional (with errors)

Discussion

Only one of the eleven Welsh institutions is currently making use of HTML5 on the institutional home page and none of them are using RDFa which can be detected by Google’s Rich Snippets testing tool.

The lack of use of RDFa, together with previous analyses of use of auto-detectable RSS feeds, would appear to indicate that University home pages are currently failing to provide machine-processable data which could be used to raise the visibility of institutional Web sites on search engines such as Google.

It is unclear whether this is due to a lack of awareness of the potential benefits which RDFa could provide, an awareness that potential benefits may not be realised due to search engines, such as Google, not currently processing RDFa from arbitrary Web sites, the difficulties in embedding RDFa due to limitations of existing CMSs, policy decisions relating to changes of such high profile pages, the provision of structured information in other ways or other reasons.

It would be useful to receive feedback from those involved in managing their  institution’s home page – and also if anyone is using RDFa (or related approaches) and does feel that they are gaining benefits.

Posted in Evidence, HTML, jiscobs, standards | 3 Comments »

Experiences Migrating From XHTML 1 to HTML5

Posted by Brian Kelly (UK Web Focus) on 10 November 2010

IWMW 2010 Web Site as a Testbed

In the past we have tried to make use of the IWMW Web site as a test bed for various emerging new HTML technologies. On the IWMW 2010 Web site this year we evaluated the OpenLike service which “provides a user interface to easily give your users a simple way to choose which services they provide their like/dislike data” as well as evaluating use of RDFa.

We also have an interest in approaches to migration from use of one set of HTML technologies to another. The IWMW 2010 Web site has  therefore provided an opportunity to evaluate deployment of HTML5 and to identify possible problem areas with backwards compatibility.

Migration of Main Set of Pages

We migrated top-level pages of the Web site from the XHTML1 Strict Doctype to HTML5 and validation of the home page, programme, list of speakers, plenary talks and workshop sessions shows that it was possible to maintain the HTML validity of these pages.

A small number of changes had to be made to in order to ensure that pages which were valid using an XHTML Doctype  were valid using HTML5. In particular we had to change the form> element for the site search and replace all occurrences of <acronym> to <abbr>. We also changed occurrences of <a name="foo"> to <a id="foo"> since the name attribute is now obsolete.

The W3C’s HTML validator also spotted some problems with links which hadn’t been spotted previously when we ran a link-checking tool. In particular we spotted a couple of occurrences of the form <a href="http://www.foo.bar "> with a space being included rather than a trailing slash. This produced the error message:

Line 175, Column 51: Bad value http://www.foo.bar for attribute href on element a: DOUBLE_WHITESPACE in PATH.
Syntax of IRI reference:
Any URL. For example: /hello, #canvas, or http://example.org/. Characters should be represented in NFC and spaces should be escaped as %20.

This seems to be an example of an instance in which HTML5 is more restrictive than XHTML 1 or HTML 4.

Although many pages could be easily converted to HTML5 a number of pages there were HTML validity problems which had been encountered with the XHTML 1 Transitional Doctype which persisted using HTML5.  These were pages which included embedded HTML fragments provided by third party Web services such as Vimeo and Slideshare. The Key Resources page illustrates the problem, for which the following  error is given:

An object element must have a data attribute or a type attribute.

related to the embedding of a Slideshare widget.

Pages With Embedded RDFa

The Web pages for each of the individual plenary talks and workshop sessions contained embedded RDFa metadata about the speakers/workshop facilitators and abstracts of the sessions themselves.  As described in a post on  Experiments With RDFa and shown in output from Google’s Rich Snippets Testing tool RDFa can be used to provide structured information such as, in this case, people, organisational and event information for an IWMW 2010 plenary talk.

However since many of the pages about plenary talks and workshop sessions contain embedded third party widgets including, for the plenary talks, widgets for videos of the talks and for the accompanying slides, these pages mostly fail to validate since the widget code provided by the services often fails to validate.

A page on “Parallel Session A5: Usability and User Experience on a Shoestringdoes, however validate using the XHTML1+RDFa Doctype, since this page does not include any embedded objects from such third party services. However attempting to validate this page using the HTML5 Doctype produces 38 error messages.

Discussion

The experiences in looking to migrate a Web site from use of XHTML 1 to HTML5 shows that in many cases such a move can be achieved relatively easily.  However pages which contain RDFa metadata may cause validation problems which might require changes in the underlying data storage.

The W3C released a working draft of a document on “HTML+RDFa 1.1: Support for RDFa in HTML4 and HTML5” in June 2010. However it is not yet clear if the W3C’s HTML validator has been updated to support the proposals containing in the draft document.  It is also unclear how embedding RDFA in HTML5 resources relates to the “HTML Microdata” working draft proposal which was also released in June 2010 (with an editor’s draft version dated 20 October 2010 also available on the W3C Web site).

I’d welcome comments from those who are working in this area.  In particular, will the user interface benefits provided by HTML5 mean that HTML5 should be regarded as a key deployment environment for new services, or is there a need to wait for consensus to emerge on ways in which metadata can be best embedded in such resources in order to avoid maintenance problems downstream?

Posted in HTML, standards | 1 Comment »

Apple Ditching Preinstalled Flash On Future Macs

Posted by Brian Kelly (UK Web Focus) on 27 October 2010

A couple of days ago there was an announcement that “Apple [is] Ditching Preinstalled Flash On Future Macs“. On the surface this decision has been taken to minimise security problems associated with Flash software – as described on the CultOfMac blogBy making users download Flash themselves, Apple is disavowing the responsibility of keeping OS X’s most infamously buggy and resource heavy third-party plugin up to date on users’ machines“.

The Guardian reported the news in rather more aggressive terms: “Apple has escalated its war with Adobe’s Flash Player by stopping including the browser plugin on the Macintosh computers that it sells” and points out how this will inconvenience many users as “The surprising and unannounced move means that buyers will have to figure out how to download the player and plugin on any of the computers that they buy – a process which Apple has not simplified by including any “click to install” links“.

Since the Guardian article pointed out that “Jobs has criticised [Flash] as ‘proprietary’” and “praised HTML5 and the video codecs available on it” this story might be regarded as a success story for open standards.  But there is a need to be aware that Flash’s proprietary nature has been recognised as a concern to those seeking to make use of open standards in development work for some time.  The NOF-Digitise Technical Advisory Service provided an FAQ which pointed out in about 2002 that “Flash is a proprietary solution, which is owned by Macromedia.  As with any proprietary solutions there are dangers in adopting it as a solution: there is no guarantee that readers will remain free in the long term, readers (and authoring tools) may only be available on popular platforms, the future of the format would be uncertain if the company went out of business, was taken over, etc.“.

In retrospect the FAQ could also be have said that “As with any open standard there are dangers in adopting it as a solution: there is no guarantee that readers will be provided on popular platforms, readers (and authoring tools) may fail to be available on popular platforms, the future of the format would be uncertain if the open standard fails to be widely adopted, etc.

It is only now, about eight years after that advice was provided, that we are seeing Flash started to be deprecated by major players and open standards alternatives being provided by such vendors. And although the vendors will inevitably cite the benefits of open standards in their press releases, since such benefits have always been apparent, in reality decisions to support open standards are likely to have been made by vendors for commercial reasons – in this case competition between Apple and Adobe.

But what can be learnt from such history lesson?  Perhaps that the availability of an open standard is no guarantee that it will supersede proprietary alternatives and that commercial vendors can have a significant role to play in ensuring the take-up of open standards.  In which case it does seem that HTML5 will be an important standard and Flash is under threat.

But whilst that view seems to be increasingly being accepted it is worth noting concerns that have been raised within W3C, the World Wide Web Consortium, with Philippe Le Hegaret pointing out thatThe problem we’re facing right now is there is already a lot of excitement for HTML5, but it’s a little too early to deploy it because we’re running into interoperability issues”.

Hmm, it seems as if the HTML5 maturity debate will continue to run.

Posted in HTML, standards | Tagged: | 1 Comment »

Is Stack Overflow Useful for Web Developers?

Posted by Brian Kelly (UK Web Focus) on 7 October 2010

A couple of months ago I reported on The Decline in JISCMail Use Across the Web Management Community. Virginia Knight responded by commenting that “many email lists to have a natural life-cycle ending with dormancy” and it does seem that the web-support list is no longer having a significant role to play in providing advice and support on technical Web issues, such as HTML, CSS and JavaScript queries, with posts now seeming to publicise job vacancies and events (in September 2010, for example, there were only 4 posts: one a advert for CILIP courses and the other three being a question, a request for a clarification followed by a clarification – but no answer provided!).

But where should Web developers go if they have such queries which need answering?  Might Stack Overflow provide an alternative?

Stack Overflow is a programming Q & A Web site which is collaboratively built and maintained b fellow programmers. The Stack Overflow About page goes on to add that “The only unusual thing we do is synthesize aspects of Wikis, Blogs, Forums, and Digg/Reddit in a way that we think is original“. The FAQ goes on to add that you should “avoid asking questions that are subjective, argumentative, or require extended discussion. This is not a discussion board, this is a place for questions that can be answered!“.

As can be seen from the image the programing scope includes various areas of interest to Web developers including HTML, CSS, JavaScript, JQuery, etc.

Stack Overflow goes beyond the simple responses that can be provided on a mailing list, allowing the person who asked a question to identify the answer which has been the most helpful.  Participants in the Stack Overflow community  can also rate the responses so that people who response with useful answers will gain reputation points – as Tony Hirst (pyschemedia on Stack Overflow, from the Open University, has shown recently with the first points he has been awarded after providing an answer to a question about Yahoo Pipes.  Once a certain level of reputation has been gained additional responsibilities are available including the ability to moderate contributions.

I can’t help but feel that although the web-support JISCMail list was useful in the early days of the Web  Web developers should be making use of richer environments for helping them in their development work.  Isn’t it time we acknowledged that the web-support list is now primarily an announcement list for jobs and events and a service like Stack Overflow can fulfill the service of finding answer to Web development queries? And since we have a lot of expertise across the sector, with people clearly willing to help and advise others, we could soon see UK Web developers with high reputation ratings on the service.

What do others think?

Posted in HTML | 7 Comments »

URI Interface to W3C’s Unicorn Validator

Posted by Brian Kelly (UK Web Focus) on 23 September 2010

The W3C recently announced that they had launched Unicorn, which they described as “a one-stop tool to help people improve the quality of their Web pages. Unicorn combines a number of popular tools in a single, easy interface, including the Markup validator, CSS validator, mobileOk checker, and Feed validator“.

Output from UnicornAn example of how this validation service works is illustrated, which is based on validation of the UKOLN home page.

The  default options provide validation of the HTML and CSS of the selected page together with any auto-discoverable RSS feeds.

The interface to the validator is a Web form hosted on the W3C Web site.

But encouraging use of such validation services would be much easier if the interface was more closely integrated with am author’s browsing environment, so that they didn’t have to visit an other page and copy and paste a URL.

The UKOLN Web site has been configured to provide this ease-of-use. Appending ,unicorn to the UKOLN home page will invoke the Unicorn validator – and this option can be used on any page on the UKOLN Web site.

This service is implemented by adding the following line to the Apache Web server’s configuration file:

RewriteRule /(.*),unicorn http://validator.w3.org/unicorn/check? ucn_uri =http://%{HTTP_HOST}/$1&ucn_task=conformance# [R=301]

I’m not sure how easy it may be to implement such extensions to Web servers these days; there may be policy barriers to such changes or perhaps technical barriers imposed by Content Management Systems.  But I wonder if this simple approach might be of interest to others?

Posted in HTML, standards, W3C | 1 Comment »

New W3C Document Standards for XHTML and RDFa

Posted by Brian Kelly (UK Web Focus) on 27 August 2010

New W3C Draft Documents

The W3C have recently announced that new “Drafts of RDFa Core 1.1 and XHTML+RDFa 1.1 [have been] Published“. The announcement states that:

The RDFa Working Group has just published two Working Drafts: RDFa Core 1.1 and XHTML+RDFa 1.1. RDFa Core 1.1 is a specification for attributes to express structured data in any markup language. The embedded data already available in the markup language (e.g., XHTML) is reused by the RDFa markup, so that publishers don’t need to repeat significant data in the document content. XHTML+RDFa 1.1 is an XHTML family markup language. That extends the XHTML 1.1 markup language with the attributes defined in RDFa Core 1.1.

Meanwhile on 24th June 2010 the latest version of the “HTML5: A vocabulary and associated APIs for HTML and XHTML” working draft was published.

Patrick Lauke’s talk on “HTML5 (and friends): The future of web technologies – today” generated a lot of interest at the IWMW 2010 event – but as I pointed out in the workshop conclusions session, there seems to be some uncertainty as to whether the focus for those involved in the provision of institutional Web services should be on the user interface developments provided in HTML5 or in use of HTML as a contained for reusable (linked) data which RDFa aims to provide.

Of course for many the requirement will be to enhance the user interface (for human visitors) and provide access to machine readable data (for machines). The latter can be achieved in various ways but if you choose to go down the RDFa route a  question then is: “Can you embed RDFa in HTML5 documents and, of so, how do you do this?“.

The answer to this question is not (yet) clear.  The W3C have published a  “HTML5+RDFa: A mechanism for embedding RDF in HTML” working draft document – but this was released in July 2009 and hasn’t been updated since [Note that while this document on the dev.w3c.org Web site has not been updated or links to new versions provided, as described in a comment to this post a more recent document on HTML+RDFa 1.1: Support for RDFa in HTML4 and HTML5, dated 24 June 2010 is available - this comment added on 2 September 2010].

This document also states that:

Implementors should be aware that this specification is not stable. Implementors who are not taking part in the discussions are likely to find the specification changing out from under them in incompatible ways. Vendors interested in implementing this specification before it eventually reaches the Candidate Recommendation stage should join the aforementioned mailing lists and take part in the discussions.

But such caveats are also true of the RDFa Core 1.1 and XHTML+RDFa 1.1 draft documents, both of which state that:

This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress

In addition the HTML5 working draft states that:

Implementors should be aware that this specification is not stable. Implementors who are not taking part in the discussions are likely to find the specification changing out from under them in incompatible ways. Vendors interested in implementing this specification before it eventually reaches the Candidate Recommendation stage should join the aforementioned mailing lists and take part in the discussions.

Meanwhile the “HTML Microdata” working draft was also published on 10th August 2010, and this again states that:

Implementors should be aware that this specification is not stable. Implementors who are not taking part in the discussions are likely to find the specification changing out from under them in incompatible ways. Vendors interested in implementing this specification before it eventually reaches the Candidate Recommendation stage should join the aforementioned mailing lists and take part in the discussions.

Microdata is being proposed as an extension of microformats which addresses deficiencies in microformats without the added complexities of  RDFa.

What Does the Future Hold?

Should you start to migrate HTML documents from an existing HTML 4 or XHTML 1 environment to HTML5?  The advice given by Patrick Lauke in his talk, as reported by @iwmwlive, was “If you want to take advantage of the new features, go ahead with HTML5, but don’t rush off to recode if you don’t need it“.  But while much of the buzz surrounding the new features provided by HTML5 concern user interface developments (such as native support for video and  enhanced forms validation) the future regarding use of HTML as a container for data seems to be somewhat uncertain.

The best advice may be not to rush off to embed data in your HTML resource if you don’t need to.  But as such advice can be a barrier to innovation if needs to be qualified by the suggestion that if you do wish to embed data using RDFa, microdata of microformats, you should ensure that you do so using a management system which will enable you to change the format you use if you discover that you have selected an approach which fails to take off.  This advice is, of course, reflects the warning given in the draft documents – but not everyone reads such advice!

Posted in HTML, standards, W3C | 4 Comments »

“Scrapping Flash and Betting the Company on HTML5″

Posted by Brian Kelly (UK Web Focus) on 24 May 2010

Scrapping Flash

We are “Scrapping Flash and betting the company of HTML5” says the CTO of Scribd (the document sharing service) according to an article published recently in TechCrunch. But this doesn’t seem to be as much of a risk as the headline implies as, according to the article “Adobe’s much-beleaguered Flash is about to take another hit and online documents are finally going to join the Web on a more equal footing“. As the article goes on to say “Scribd is joining a chorus of companies from Apple to Microsoft in siding with HTML5 over Flash. Tomorrow only 200,000 of the most popular documents will be available in HTML5, but eventually all of them will be switched over“. The article goes on to point out that “When it’s done, Scribd alone will convert billions of document pages into Web pages“.

Open Standards and the NOF-digi Programme

Good, you may think, it’s about time we made greater use of open standards. And this sentiment underpinned various standards documents I have contributed to since about 1996 for the JISC and the cultural heritage sector.  As an example consider the NOF-digitise Technical Advisory Service which was provided by UKOLN and the AHDS  from 2001-2004.  These two service were commissioned to document the open standards to be used by this national digitisation programme. So we described open standards, such as SMIL and SVG, and, despite warning of the dangers in mandating premature adoption of open standards, the first version of the standards document did not address the potential difficulties in developing services based on these immature W3C standards.

Unsurprisingly, once the project had received their funding and began to commission development work we received questions such asDoes anyone have any thoughts on the use of file formats such as Flash or SVG in projects? There is no mention of their use in the technical specifications so I  wondered whether their suitability or otherwise had been considered“. I can remember the meeting we had with the NOF-digitise progamme managers after receiving such queries and the difficulty policy makers had in appreciating that simply mandating use of open standards might be inappropriate.

Our response was to explain the reasons why open standards were, in principle, to be preferred over use of proprietary formats:

The general advice is that where the job can be done effectively using non-proprietary solutions, and avoiding plug-ins, this should be done. If there is a compelling case for making use of proprietary formats or formats that require the user to have a plug-in then that case can be made in the business plan, provided this case does not contradict any of the MUST requirements of the nof technical guidelines document.

Flash is a proprietary solution, which is owned by Macromedia.  As with any proprietary solutions there are dangers in adopting it as a solution: there is no guarantee that readers will remain free in the long term, readers (and authoring tools) may only be available on popular platforms, the future of the format would be uncertain if the company went out of business, was taken over, etc.

However we did acknowledge the difficulties of forcing projects to use open standards and concluded:

To, to summarise, if you *require* the functionality provided by Flash, you will need to be aware of the longer term dangers of adopting it.  You should ensure that you have a migration strategy so that you can move to more open standards, once they become more widely deployed.

We subsequently recommended updates to the projects’ reporting mechanism so that projects had to respond to the following questions before use of proprietary formats would be accepted:

(a) Area in which compliance will not be achieved

(b) Explain why compliance will not be achieved including research on appropriate open standards)

(c) Describe the advantages and disadvantages of your proposed solution

(d) Describe your migration strategies in case of problems

Our FAQ provided an example of how these questions might be answered in the case of use of Flash. What we expected (and perhaps hoped for) back then was that there would be a steady growth in the development of tools which supported open standards and the benefits of the standards would lead to a move away from Flash.  This, however, hasn’t happened. Instead it seems to have been the lack of support for Flash on the iPhone and the iPad which has led to recent high-profile squabbles, in particular Steve Job’s open letter giving his Thoughts on Flash. His letter points out that

Flash was created during the PC era – for PCs and mice. Flash is a successful business for Adobe, and we can understand why they want to push it beyond PCs. But the mobile era is about low power devices, touch interfaces and open web standards – all areas where Flash falls short.

and concludes by saying:

New open standards created in the mobile era, such as HTML5, will win on mobile devices (and PCs too). Perhaps Adobe should focus more on creating great HTML5 tools for the future, and less on criticizing Apple for leaving the past behind.

It seems, according to Jobs, that it is the requirements of the mobile platform which is leading to the move towards open standards on both mobile and PC platforms.

Eight Years Later

About eight years later it now seems appropriate to move away from Flash and, instead, use HTML5. This long period between initial announcements of new open standards and their appropriateness for mainstream use will differ for different standards – in the case of RDF, for example, the initial family of standards were published in 2004 but it has only been in the past year or so that interest in the deployment of Linked Data services has gained wider popularity. But the dangers of forcing use of open standards is, I hope, becoming better understood.

And this is where I disagree with a recent article by Glyn Moody who, in a recent tweet, suggested that “European Commission Betrays Open Standards – http://bit.ly/bl6HJt pusillanimity“. In an article published in ComputerWorld UK Glyn argued that the “European Commission Betrays Open Standards“. I have skimmed through the latest leak [PDF format] of an imminent Digital Agenda for Europe. What I noticed is that the document calls for “Promoting better use of standards” which argues that “Public authorities should make better use of the full range of relevant standards when procuring hardware, software and iT systems”.  It is the failure of the document in “promoting open standards and all the benefits that these bring” which upsets Glyn, who adds that “accept[ing] ‘pervasive technologies’ that *aren’t* based on standards” is “a clear reference to Microsoft“.

But maybe the European Commission have understood the complexities of the deployment of open standards and the risks that mandating their use across public sector organisations might entail.  And let’s not forget that,in the UK, we have a history of mandating open standards which have failed to take off – remember OSI networking protocols?

Pointing out that open standards don’t always live up to their promise and it make take several years before they are ready for mainstream use is applying an evidence-based approach to policy. Surely something we need more of, n’est-ce pas?

Posted in HTML, standards | 1 Comment »

Experiments With RDFa

Posted by Brian Kelly (UK Web Focus) on 3 May 2010

The Context

In a recent post I outlined some thoughts on Microformats and RDFa: Adding Richer Structure To Your HTML Pages. I suggested that it might now be timely to evaluate the potential of RDFa, but added a note of caution, pointing out that microformats don’t appear to have lived up to their initial hype.

Such reservations were echoed by Owen Stephens who considered using RDFa (with the Bibo ontology) to enable sharing of ‘references’ between students (and staff) as part of his TELSTAR project and went on to describe the reasons behind this decisions. Owen’s decision centred around deployment concerns. In contrast Chris Gutteridge had ideological reservations, as he “hate[s] the mix of visual & data markup. Better to just have blocks of RDF (in N3 for preference) in an element next to the item being talked about, or just in the page“. Like me, Stephen Downes seems to be willing to investigate and asked for “links that would point specifically to an RDFa syntax used to describe events?“. Michael Hausenblas provided links to two useful resources: W3C’s Linked Data Tutorial – on Publishing and consuming linked data with RDFa and a paper on “Building Linked Data For Both Humans and Machines” (PDF format). Pete Johnson also gave some useful comments and provided a link to recently published work on how to use RDFa in HTML 5 resources.

My Experiments

Like Stephen Downes I thought it would be useful to begin by providing richer structure about events. My experiments therefore began by adding RDFa markup for my forthcoming events page.

As the benefits of providing such richer structure for processing by browser extensions appear to be currently unconvincing my focus was in providing such markup by a search engine. The motivation is therefore primarily to provide richer markup for events which will be processed by a widely-used service in order that end users will receive better search results.

My first port of call was a Google post which introduced rich snippets. Google launched their support for Rich Snippets less than a year ago, in May 2009. They are described as “a new presentation of snippets that applies Google’s algorithms to highlight structured data embedded in web pages“.

Documentation on the use of Rich Snippets is provided on Google’s Webmaster Tools Web site. This provides me with information on RDFa (together with microdata and microformats) markup for events. Additional pages provide similar information on markup about people and businesses and organisations.

Although I am aware that Google have been criticised for developed their own vocabulary for their Rich Snippets I was more interested in carrying out a simple experiment with use of RDFa than continuing the debate on the most appropriate vocabularies.

The forthcoming events page was updated to contain RDFa markup about myself (name, organisation and location of my organisation, including the geo-location of the University of Bath.

For my talks in 2010 I replaced the microformats I have used previously with RDFa markup along the providing information on the date of the talks and their location (again with geo-location information).

No changes where noticeable when viewing the page normally. However using FireFox plugins which display RDFa (and microformat) information I can see that software is able to identify the more richly structured elements in the HTML page. The screenshot shows how the markup was rendered by the Operator sidebar and the RDFa Highlight bookmarklet and, in the status bar at the bottom of the screen, links to an RDFa validator and the SIOC RDF Browser.

Rendering of RDFa markup using various FireFox tools.

If you compare this image with the display of how microformats are rendered by the Operator plugin it will be noted that the display of microformats shows the title of the event whereas the display of RDFa lists the HTML elements which contain RDFa markup. The greater flexibility provided by RDFa appears to come at the price of a loss of context which is provided by the more constrained uses provided by microformats.

Is It Valid?

Although the HTML RDFa Highlight bookmarklet demonstrated that RDFa markup was available and indicated the elements to which the markup had been applied, there was also a need to modify other aspects of the HTML page. The DTD was changed from a HTML 1.0 Strict to:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">

If addition the namespace of the RDFa elements needed to be defined:

<html xmlns="http://www.w3.org/1999/xhtml"
  xmlns:cc="http://creativecommons.org/ns#"
  xmlns:v="http://rdf.data-vocabulary.org/#"
  xml:lang="en">

It was possible for me to do this as I have access to the HTML page including elements defined in the HTML . I am aware that some CMS applications may not allow such changes to be made and, in addition, organisations may have policies which prohibit such changes.

On subsequently validating the page I discovered, however, HTML validity errors. It seems that my use of name="foo" attribute has been replaced by id="foo".

The changes to the DTD and the elements and the inclusion of the RDFa markup weren’t the only changes I had to make, however. I discovered that the id="foo attribute requires "foo" to start with an alphabetic character. I therefore had to change id="2010" to id="year-2010". This, for me, was somewhat more worrying as rather than just including new or slightly modified markup which was backwards-compatible, I was now having to change the URL of an internal anchor. If the anchors had started with an alphabetic character this wouldn’t have been an issue (and I would have been unaware of the problem). However it seems that a migration from a document-centred XHTML 1.0 Strict conforming world to the more data-centric XHTML 1.1+RDFa world may result in links becoming broken. I was prepared to make this change on my pages of forthcoming and recent events and change links within the pages. However if others are linking to these internal anchors (which I think is unlikely) then the links with degrade slightly (they won’t result in the display of a 404 error message; instead the top of the page will be displayed, rather than the entries for the start of the particular year).

Google’s View of the RDFa Markup

Using Google’s Rich Snippets Testing Tool it is possible to “enter a web page URL to see how it may appear in search results“. The accompanying image shows the output of this tool for my events page.

Rendering of RDFa markup

This shows the structure of the page which Google knows about. As Google knows the latitude and longitude for the location of the talk it can use this for location based services and it can provide the summary of the event and a description.

Is It Correct?

Following my initial experiment my former colleague Pete Johnston (now of Eduserv) kindly gave me some feedback. He alerted me to W3C’s RDFa Distiller and Parser service – and has recently himself published posts on Document metadata using DC-HTML and using RDFa and RDFa 1.1 drafts available from W3C.

Using the Distiller and Parser service to report on my event page (which has now been updated) I found that I had applied a single v:Event XML element where I should have used three elements for the three events. I had also made a number of other mistakes when I made use of the examples fragments provided in the Google Rich Snippets example without having a sound understanding of the underlying model and how it should be applied. I hope the page is now not only valid but uses a correct data model for my data.

I should add that I am not alone in having created resources containing Linked data errors. A paper on “Weaving the Pedantic Web” (PDF format) presented at the Linked Data on the Web 2010 workshop described an analysis of almost 150,00 URIs which revealed a variety of errors related to accessing and dereferencing resources and processing and parsing the data found. The awareness of such problems has led to the establishment of the Pedantic Web Group which “understand[s] that the standards are complex and it’s hard to get things right” but nevertheless “want[s] you to fix your data“. There will be a similar need to avoid polluting RDFa space with incorrect data.

Is It Worthwhile?

The experiences with microformats would seem to indicate that benefits of use of RDFa will be gained if large scale search engines support its use, rather than providing such information with an expectation that there will be significant usage by client-side extensions.

However the Google Rich Snippets Tips and Tricks Knol page state that “Google does not guarantee that Rich Snippets will show up for search results from a particular site even if structured data is marked up and can be extracted successfully according to the testing tool“.

So, is it worth providing RDFa in your HTML pages? Perhaps if you have a CMS which creates RDFa or you can export existing event information in an automated way it would be worth adding the additional semantic markup. But you need to be aware of the dangers of doing this in order to enhance findability of resources by Google since Google may not process your markup. And, of course, there is no guarantee that Google will continue to support Rich Snippets. On the other hand other vendors, such as Yahoo!, do seem to have an interest in supporting RDFa – so potentially RDFa could provide a competitive advantage over other search engine providers.

But, as I discovered, it is easy to make mistakes when using RDFa. So there will be essential to have an automated process for the production of pages containing RDFa – and there will be a need to ensure that the data model is correct as well as the page being valid. This will require a new set of skills as such issues are not really relevant in standard HTML markup.

I wonder if I have convinced Owen Stephens and Chris Gutteridge who expressed their reservations about use of RDFa? And are there any examples of successful use of RDFa which people know about?

“RDFa from Theory to Practice” Workshop Session

Note that if you have an interest in applying the potential of RDFa in practice my colleagues Adrian Stevenson, Mark Dewey and Thom Bunting will be running a 90 minute workshop session on “RDFa from theory to practice” at this year’s IWMW 2010 event to be held at the University of Sheffield on 12-14 July.

Posted in HTML, W3C | Tagged: | 5 Comments »

Microformats and RDFa: Adding Richer Structure To Your HTML Pages

Posted by Brian Kelly (UK Web Focus) on 25 March 2010

Revisiting Microformats

If you visit my presentations page you will see a HTML listing of the various talks I’ve given since I started working at UKOLN in 1996.  The image shown below gives a slightly different display from the one you will see, with use of a number of FireFox plugins providing additional ways of viewing and processing this information.

Firefox extensions

This page contains microformat information about the events.  It was at UKOLN’s IWMW 2006 event that we made use of microformats on the event Web site for the first time with microformats being used to mark up the HTML representation for the speakers and workshop facilitators together with the timings for the various sessions. At the event Phil Wilson ran a session on “Exposing yourself on the Web with Microformats!“. There was much interest in the potential of microformats back in 2006, which was then the hot new idea.  Since then I have continued to use microformats to provide richer structural information for my events and talks. I’ll now provide a summary of the ways in which the microformats can be used, based on the image shown above.

The Operator sidebar (labelled A in the image) shows the Operator FireFox plugin which “leverages microformats and other semantic data that are already available on many web pages to provide new ways to interact with web services“. The plugin detects various microformats embedded in a Web page and supports various actions – as illustrated, for events the date, time and location and summary of the event can be added to various services such as Google and Yahoo! Calendar.

The RDFa in Javascript bookmarklets (labelled B) are simple JavaScript tools which can be added to a variety of different browsers (they have been tested on IE 7,  Firefox, Safari, Mozilla and Safari). The License bookmarklets will create a pop-up alert showing the licence conditions for a page, where this has been provided in a structured form. UKOLN’s Cultural Heritage briefing documents are available under a Creative Commons licence. Looking at, for example, the Introduction to Microformats briefing document, you will see details of the licence conditions displayed for reading. However, in addition, a machine-readable summary of the licence conditions is also available which is processed by the Licence bookmarklet and displayed as a pop-up alert. This information is provided by using the following HTML markup:

<p><a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/2.0/">
<img src="http://creativecommons.org/images/public/somerights20.gif"
   alt="Creative Commons License" /></a>This work is licensed under a
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/2.0/">Creative Commons
License</a>.</p>

The power is in the rel=”license” attribute which assigns ‘meaning’ to the hypertext link.

The link to my Google Calendar for each of the events (labelled C) is provided by the Google hCalendar Greasemonkey script. Clicking on the Google Calendar icon (which is embedded in the Web page if hCalendar microformatting markup is detected – although I disable this feature if necessary) will allow the details to be added to my Google Calendar without me having to copy and paste the information.

The additional icons in the browser status bar (labelled D) appear to be intended for debugging of RDFa – and I haven’t yet found a use for them.

The floating RSS Panel (labelled E) is another GrreaseMonkey script. In this case the panel does not process microformats or RDFa but autodetectable links to RSS feeds. I’m mentioning it in this blog post in order to provide another example of how richer structure in HTML pages can provide benefits to an end user. In this case in provides a floating panel in which RSS content can be displayed.

RDFa – Beyond Microformats

The approaches I’ve described above date back to 2006, when microformats was the hot new idea.  But now there is more interests in technologies such as Linked Data and RDF. Those responsible for managing Web sites with an interest in emerging new ways of enhancing HTML pages are likely to have an interest in RDFa: a means of including RDF in HTML resources.

The RDFa Primer is sub-titled “Bridging the Human and Data Webs“. This sums up nicely what RDFa tries to achieve – it enables Web editors to provide HTML resources for viewing by humans whilst simultaneously providing access to structured data for processing by software.  Microformats provided an initial attempt at doing this, as I’ve shown above.  RDFa is positioning as providing similar functionality, but coexisting with developments in the Linked Data area.

The RDFa Primer provides some examples which illustrate a number of use cases.  My interest is in seeing ways in which RDFa might be used to support Web sites I am involved in building, including this year’s IWMW 2010 Web site.

The first example provided in the primer describes how RDFa can be used to describe how a Creative Commons licence can be applied to a Web page; an approach which I have described previously.

The primer goes on to describe how to provided structured and machine understandable contact information, this time using the FOAF (Friends of a Friend) vocabulary:

<div typeof="foaf:Person" xmlns:foaf="http://xmlns.com/foaf/0.1/">
   <p property="foaf:name">Alice Birpemswick</p>
   <p>Email: <a rel="foaf:mbox" href="mailto:alice@example.com">alice@example.com</a></p>
   <p>Phone: <a rel="foaf:phone" href="tel:+1-617-555-7332">+1 617.555.7332</a></p>
</div>

In previous year’s we have marked up contact information for the IWMW event’s program committee using hCard microformats. We might be in a position now to use RDFa. If we followed the example in the primer we might use RDFa to provide information about the friends of the organisers:

<div xmlns:foaf="http://xmlns.com/foaf/0.1/"> <ul> <li typeof="foaf:Person"> <a rel="foaf:homepage" href="http://example.com/bob/">Bob</a> </li> <li typeof="foaf:Person"> <a rel="foaf:homepage" href="http://example.com/eve/">Eve</a> </li> <li typeof="foaf:Person"> <a rel="foaf:homepage" href="http://example.com/menu/">Menu</a> </li> </ul></div>

However this would not be appropriate for an event. What would be useful would be to provide information on the host information for the speakers and workshop facilitators. In previous year’s such information has been provided in HTML, with no formal structure which would allow automated tools to process such institutional information.  If  RDFa was used to provide such information for the 13 years since the event was first launched this could allow an automated tool to process the event Web sites and provide various report on the affiliations of the speakers. We might be then have a mechanism for answering the query “Which institution has provided the highest number of (different) speakers or facilitators at IWMW events?“. I can remember that Phil Wilson, Andrew Male and Alison Kerwin (nee Wildish) from the University of bath have spoken at events, but who else? And what about the Universities which I am unfamiliar with?   This query could be solved if the data was stored in a backend database, but as the information is publicly available on the Web site, might not using slightly more structured content on the Web site be a better approach?

Really?

When we first started making use of microformats I envisaged that significant numbers of users would be using various tools on the browser to process such information.  However I don’t think this is the case (and I would like to hear from anybody who does make regular use of such tools).   I have to admit that although I have been providing microformats for my event information, I have not consumed microformats provided by others (and this includes the microformats provided on the events page on the JISC Web site).

This isn’t, however, necessarily an argument that microformats – or RDFa -  might not be useful.  It  may be that the prime use of such information is by server-side tools which harvest such information form a variety of sources. In May 2009, for example, Google announced that Google Search Now Supports Microformats and Adds “Rich Snippets” to Search Results. Yah0o’s SearchMonkey service also claims to support structured search queries.

But before investing time and energy into using RDFa across an event Web site the Web manager will need answers to the questions:

  • What benefits can this provide?  I’ve given one use case, but I’d be interested in hearing more.
  • What vocabularies do we need to use and how should the data be described? The RDFa Primer provides some example, but I am unsure as to how to use RDFa to state that, for example, Brian Kelly is based at the University of Bath, to enable structured searches of all speakers from the University of Bath.
  • What tools are available which can process the RDFa which we may chose to create?

Anyone have answers to these questions?

Posted in HTML, W3C | Tagged: , | 11 Comments »

Will The UK Government Shut Down The Queen’s Web Site?

Posted by Brian Kelly (UK Web Focus) on 13 December 2007

In a post on All UK Government Web Sites Must Be WCAG AA Compliant I recently warned of the dangers that the UK Government’s blunt instrument of mandating that all UK government Web sites must comply with WCAG AA accessibility guidelines could be counter-productive as the current WCAG 1.0 guidelines are widely felt to be out-of-date and government departments which seek to comply with the guidelines may well result in Web design patterns which are now widely felt to enhance the effectiveness of Web sites but which infringe guidelines released back in 1998 being discarded.

I recently viewed the Official Web Site of the British Monarchy (don’t ask) and spotted a visible <FONT> tag preceding a news item about the Queen’s speeches in Uganda.

Her Majesty's Web Site

Surely the Queen’s Web site isn’t using <FONT> tags, I thought? The Queen can’t possibly have employed a self-taught Web coder who hasn’t updated their skills in over five years? But looking at the source code and validating the page my worst fears came true: 36 HTML errors, no DOCTYPE, spacer GIFs, unclosed <FONT> tags (as I had spotted), <IMG> tags with no ALT attributes, a mixture of XHTML and HTML elements, …

Now this page clearly fails to comply with the UK Government proposed accessibility requirements. What, then, will happen if these proposals are accepted and the Queen fails to correct the errors by next year’s deadline? Will the Government attempt to shut down Her Majesty’s Web site? Will the Government take the Queen to court? But won’t “Regina vs Regina ” lead to a constitutional crisis? Will this lead to the demise of the monarchy and the establishment of a republic? Or will such a vindictive move by pedantic civil servants lead to a backlash, with the possibility of the Tower for the more extreme of the ‘accessibility standardistas‘?

More seriously the British Monarchy Web site probably does provide a good example of a service (perhaps not quite a public-sector service, though) which would be improved by simply following the WCAG guidelines.  So maybe my concerns would only apply to those Web sites which are seeking to be more interactive and user-focussed than the brochureware approach which the British Monarchy site provides.

Posted in Accessibility, HTML | 4 Comments »

HTML Email – Views From The Grizzled Techies And Evil Marketeers

Posted by Brian Kelly (UK Web Focus) on 26 March 2007

One of our web officers has been asking about whether there’s any good, reasonably priced training in creating HTML mails. If anyone has any experience with this, would you let me know?

That message, sent recently to the web-support JISCMail list seemed a reasonable request for information. So I was surprised to see responses saying “I can give you a complete course right now. Don’t do it“, “If people learn to write they don’t need HTML to spice their text” and “the people that want it are the very last people that should be allowed to have it. To me, the reception of HTML email from an organisation is a great big hint that I never ever want to deal with that organisation.

Well, there are some unequivocal positions! And look at that last comments: “the people that want it are the very last people that should be allowed to have it.” What happened to having a user-focussed approach to Web development?

Fortunately there were other responses to the debate which took a more holistic view: “Don’t just say ‘No’, say ‘Let US do it’, or at least ‘Let us get involved’. Take control if possible. Otherwise they’ll just do it anyway, and quite possibly do it (very) badly.

The debate seemed to polarise the “grizzled techies” and the “evil marketing managers”. One of the latter gave his reasons for making use of HTML in email:

As the resident evil marketing manager on the list I’ve tried to restrain myself but can’t hold back any longer…

We always use HTML based e-mail for our marketing (we send multipart e-mails with a text version so that most users should see something on their screen). All our e-mail marketing is opt-in and we give an unsubscribe link on every message sent, partly because that’s the law, but mainly because it’s polite – we’re happy that our unsubscribe rate is reasonably low. We developed a set of corporate templates which were thoroughly tested with Outlook, Outlook express, Hotmail, Gmail, Mac mail, et al (if you think getting HTML to render in a variety of browsers is fun wait until you start developing HTML e-mail!). Every message we send is sent to test accounts using a variety of e-mail services before we send in bulk.

It does strike me that there are two polarised communities. Coincidentally around the time this discussion was taking place I attended the Aoc Nilta conference [note Web site no longer available - 12 Jan 2009], at which, as described in a posting by Scott Wilson, personalisation was one of the key themes of the conference (and, as described recently by the BBC, is also on the Government’s agenda).

My view? I’m on the side of providing flexibility for the user community – and if the marketing community are the ones who try to respond to the users’ needs, then we should be working more closely with that group, rather than the dated technical views of the grizzled techies!

Posted in HTML, Web2.0 | 12 Comments »

Christmas Quiz II – An Answer

Posted by Brian Kelly (UK Web Focus) on 20 December 2006

In the Christmas Quiz II posting I asked the question:

The current version of HTML is XHTML 1.1. What is the next version likely to be:
XHTML 1.2 XHTML 2 HTML 5

There were two responses to this question which I will discuss in more detail:

Read the rest of this entry »

Posted in HTML | 2 Comments »