UK Web Focus

Innovation and best practices for the Web

Archive for the ‘W3C’ Category

What Could ITS 2.0 Offer the Web Manager?

Posted by Brian Kelly on 24 January 2014

ITS 2.0 videoBack in October 2013 the W3C announced that the Internationalization Tag Set (ITS) version 2.0 had become a W3C recommendation. The announcement stated:

The MultilingualWeb-LT Working Group has published a W3C Recommendation of Internationalization Tag Set (ITS) Version 2.0. ITS 2.0 provides a foundation for integrating automated processing of human language into core Web technologies. ITS 2.0 bears many commonalities with its predecessor, ITS 1.0, but provides additional concepts that are designed to foster the automated creation and processing of multilingual Web content. Work on application scenarios for ITS 2.0 and gathering of usage and implementation experience will now take place in the ITS Interest Group. Learn more about the Internationalization Activity.

Following the delivery of this standard, on 17 January 2014 the MultilingualWeb-LT Working Group was officially closed.

But what exactly does ITS 2.0 do, and is it relevant to the interests of institutional web managers, or research, teaching or administrative departments within institutions?

The ITS 2.0 specification provides an overview which seeks to explain the purpose of the standard but, as might be expected in a standards document, this is rather dry. There are several other resources which discuss ITS 2.0 including:

But the resource I thought was particularly interesting was the ITS 2.0 video channel. This contains a handful of videos about the ITS standard. One video in particular provides a brief introduction to ITS 2.0 and the advantages it can offer businesses involved in multilingual communication. This 8-minute long video can be viewed on YouTube but it is also embedded below:

The video, an animated cartoon, is interesting because of the informal approach it takes to explaining the standard. This, in my experience, is unusual. The approach may not be appreciated by everyone but since standards are widely perceived to be dull and boring, although still acknowledged as important. For me, providing a summary of the importance of standards in this way can help to reach out to new audiences who might otherwise fail to appreciate the role which standards may have.

If you are involved in providing web sites or content which may be of interest to an international audience it may be worth spending 8 minutes to view this video. If ITS 2.0 does appear to be of interest the next question will be what tools are available to create and process ITS 2.0 metadata? A page on ITS Implementations is available on the W3C web site but again this is rather dry and the tools seem to be rather specialist. However more mainstream support for ITS 2.0 is likely to be provided only if there is demand for it. So if you do have an interest in metadata standards which can support automated translations and you feel ITS 2.0 may be of use, make sure you ask your CMS vendor if they intend to support it.

Might this be of interest to University web managers? If you are a marketing person at the University of Bath and wish to see your marketing resources publicised to the French-speaking world but have limited resources for translating your resources, you probably wouldn’t want:

The University of Bath is based in a beautiful georgian city: Bath. 

to be translated as:

L’université de bain est basé dans une belle ville géorgienne: bain.

And whilst Google translate actually does preserve the word “Bath” if it is given in capitals, this seems not to be the case in all circumstances. For example, the opening sentence on the Holburne Museum web site:

Welcome to Bath’s art museum for everyone. 

is translated as:

Bienvenue au musée d’art de salle de bain pour tout le monde.

Perhaps marketing people in many organisations who would like to ensure that automated translation tools do not make such mistakes should be pestering their CMS vendors for ITS 2.0 support!


View Twitter conversation from: [Topsy]

Posted in standards, W3C | Tagged: | 3 Comments »

“John hit the ball”: Should Simple Language Be Mandatory for Web Accessibility?

Posted by Brian Kelly on 18 September 2012

W3C WAI “Easy to Read” (e2r) Work

The W3C/WAI Research and Development Working Group (RDWG) is planning an online symposium on “Easy to Read” (e2r) language in Web Pages/Applications (e2r Web). The closing date for submissions (which can be up t0 1,000 words) is 24 September 12 October 2012. The symposium itself will take place on 3 December 2012.

The Easy to Read activity page provides an introduction to this work:

Providing information in a way that can be understood by the majority of users is an essential aspect of accessibility for people with disabilities. This includes rules, guidelines, and recommendations for authoring text, structuring information, enriching content with images and multimedia and designing layout to meet these requirements.

and goes on to describe how:

Easy to Read today is first of all driven by day to day practice of translating information (on demand). More research is needed to better understand the needs of the users, to analyze and compare the different approaches, to come to a common definition, and to propose a way forward in providing more comprehensive access to language on the Web.

It provides a list of potentially useful tools and methods for measuring readability:

  • Flesch Reading Ease
  • Flesch-Kincaid Grade Level
  • Gunning Fog Index
  • Wiener Sachtextformel
  • Simple Measure Of Gobbledygook (SMOG)
  • Gunning fog index (FOG)

The aim of this work is to address the needs of people with disabilities:

  • People with cognitive disabilities related to functionalities such as
    • Memory
    • Problem solving (conceptualizing, planning, sequencing, reasoning and judging thoughts and actions)
    • Attention (e.g. Attention deficit hyperactivity disorder – ADHD) and awareness
    • Reading, linguistic, and verbal comprehension (e.g. Dyslexia)
    • Visual Comprehension
    • Mental health disabilities
  • People with low language skills including people who are not fluent in a language
  • Hearing Impaired and Deaf People

Early Work in this Area

When I saw this announcement it reminded me of early W3A WAI work in this area. Back in March 2004 an early draft of the WCAG 2.0 guidelines for Web accessibility provided the following guideline:

Guideline 3.1 Ensure that the meaning of content can be determined.

and went on to describe level 3 success criteria which could demonstrate that this guideline had been achieved:

  • Syntax
    • Using the simplest sentence forms consistent with the purpose of the content
      • For example, the simplest sentence-form for English consists of Subject-Verb-Object, as in John hit the ball or The Web site conforms to WCAG 2.0.
    • Using bulleted or numbered lists instead of paragraphs that contain long series of words or phrases separated by commas.
  • Nouns, noun-phrases, and pronouns
    • Using single nouns or short noun-phrases.
    • Making clear pronoun references and references to earlier points in the document

Yes, if that version of the WCAG guidelines had been implemented if you wished your Web site to conform with WCAG Level 3 you would have had to ensure that you avoided complex sentences!

Conformance with Level 3 guidelines were intended to Web resources “accessible to more people with all or particular types of disability“. The guidelines explained how “A conformance claim of “WCAG 2.0 AAA” can be made if all level 1, level 2, and all level 3 success criteria for all guidelines have been met.

Such guidelines would be helpful for people with cognitive disabilities: those with Asperger’s syndrome, for example, find it difficult to understand metaphors such as “It’s raining cats and dogs“. The guidelines seem to have been developed by those who wished to implement the vision of “universal accessibility“. But I think we can see that seeking to address accessibility in this fashion is flawed.

Dangers of Such Work

I have to admit that I would be worried if the Easy to Read research activities were to lead to enhancements to the WCAG guidelines. Under the current WAI model, full conformance to WCAG, together with ATAG and UAAG guidelines is supposed to lead to universal accessibility. There is also an assumption that universal accessibility is a desired goal.

But is this really the case? The early drafts of WCAG 2.0 guidelines suggested that “John hit the ball” conformed with the goal of ensuring that the meaning of the content can be determined. Would WCAG 2.0 checking tools flag “the ball was hit by John” as an accessibility error, meaning that the Web page could not achieve the highest accessibility rating? And what about my favourite sports headline: “Super Caley Go Ballistic Celtic Are Atrocious” – a headline which brings a smile if Mary Poppins was part of your cultural background and you recognise Celtic as a football team, but which is clearly not universally accessible.

I would welcome research into ways in which styles of writing can enhance the accessibility of the content to people with disabilities. My concern would be if such research were to be incorporated into future versions of WCAG guidelines – especially if WCAG conformance is mandated in legislation, as is the case in some countries. But rather than failing to carry out such research, I feel the main challenge for WAI is to re-evaluate its underlining model based on the triumvirate of standards and its commitment to ensuring that Web resources are universally accessible – this might be a great soundbite, but in reality may be an unachievable – and even undesirable – goal. After all ‘universal accessibility’ doesn’t appear to  allow for any contextualisation and an important aspect of accessibility must surely be the context of use. What do you think?


Twitter conversation from: [Topsy] – [SocialMention] – [WhosTalkin]

Posted in Accessibility, W3C | Tagged: | 6 Comments »

Privacy Settings For UK Russell Group University Home Pages

Posted by Brian Kelly on 24 May 2011

On the website-info-mgt JISCMail List Claire Gibbons, Senior Web and Marketing Manager at the University of Bradford today askedHas anyone done anything in particular in response to the changes to the rules on using cookies and similar technologies for storing information from the ICO?” and went on to add that “We were going to update and add to our privacy policy in terms of what cookies we use and why“.

This email message was quite timely as privacy issues will be featured in a plenary talk at UKOLN’s forthcoming IWMW  2011 workshop which will be held at the University of Reading on 26-27 July with Dave Raggett giving the following talk:

Online Privacy:
This plenary will begin with a report on work on privacy and identity in the EU FP7 PrimeLife project which looks at bringing sustainable privacy and identity management to future networks and services. There will be a demonstration of a Firefox extension that enables you to view website practices and to set personal preferences on a per site basis. This will be followed by an account of what happened to P3P, the current debate around do not track, and some thoughts about where we are headed.

The Firefox extension mentioned in the abstract is known as the ‘Privacy Dashboard’ and is described as “a Firefox add-on designed to help you understand what personal information is being collected by websites, and to provide you with a means to control this on a per website basis“. The output for a typical home page is illustrated.

The dashboard was developed by Dave Raggett with funding from the European Union’s 7th Framework Programme for the PrimeLife project, a pan-European research project focusing on bringing sustainable privacy and identity management to future networks and services.

In order to observe patterns of UK Universities practices in online privacy I have used the W3C Privacy Dashboard to analyse the home pages of the twenty UK University Russell Group Web sites. The results are given in the following table.

Ref. No. Institution Cookies External third party Invisible images
Session cookies Lasting cookies External lasting cookies Sites Cookies Lasting cookies
1 University of Birmingham 3 3 0 4 0 2 0
2 University of Bristol 0 0 0 4 0 6 8
3 University of Cambridge 1 3 0 3 1 2 0
4 Cardiff University 1 4 0 0 0 0 0
5 University of Edinburgh 1 4 0 0 0 0 0
6 University of Glasgow 2 3 0 2 1 6 2
7 Imperial College 3 3 0 3 0 2 0
8 King’s College London 3 3 0 3 1 6 0
9 University of Leeds 2 3 0 1 0 0 0
10 University of Liverpool 2 3 0 2 2 3 0
11 LSE 3 0 0 1 0 0 0
12 University of Manchester 3 0 0 1 0 0 0
13 Newcastle University 2 0 0 0 0 0 3
14 University of Nottingham 2 3 0 2 0 5 0
15 University of Oxford 1 5 0 1 0 0 1
16 Queen’s University Belfast 1 3 0 1 0 0 0
17 University of Sheffield 2 3 0 0 1 0 0
18 University of Southampton 1 3 0 3 0 0 0
19 University College London 1 2 7 0 0 0 0
20 University of Warwick 9 6 0 39 2 95 6
TOTAL 43 54 7 70   127 20 

It should be noted that the findings appear to be volatile, with significant differences being found when the findings were checked a few days after the initial survey.

How do these findings compare with other Web sites, including those on other sectors?  It is possible to query the Privacy Dashboard’s  data on Web sites for which data is available, which include Fortune 100 Web site. In addition I have used the tool on the following Web sites:

Ref. No. Institution Cookies External third party Invisible images Additional Comments
Session cookies Lasting cookies External lasting cookies Sites Cookies Lasting cookies
1 W3C  0  0 0 2  0 4 1 P3P Policy
2 Facebook Home page  4 6 0  1 0  0  1
3 Google  0  7  0 0  0  1 0
4 No. 10 Downing Street 1  4  0  8  0 52 1 (Nos. updated after publication)
5 BP 1 1 0 0 0 0 2 P3P Policy
6 Harvard 3 4 1 0 0 0
7 ICO.gov.uk 2 3 0 1 0 0 1

I suspect that many Web managers will be following Claire Gibbon’s lead in seeking to understand the implications of the changes to the rules on using cookies and similar technologies for storing information and reading the ICO’s paper on Changes to the rules on using cookies and similar technologies for storing information (PDF format).  I hope this survey provides a context to the discussions and that policy makers find the Privacy Dashboard tool useful.  But in addition to ensuring that policy statements regarding use of cookies are adequately documented, might not this also provide an opportunity to implement a machine-readable version of such policy. Is it time for P3P, the Platform for Privacy Preferences Project standard, to make a come-back?

Posted in Evidence, Legal, openness, standards, W3C | Tagged: | 15 Comments »

New HTML5 Drafts and Other W3C Developments

Posted by Brian Kelly on 13 April 2011

 

New HTML5 Drafts

The W3C’s HTML Working Group has recently announced the publication of eight documents:

Last Call Working Drafts for RDFa Core 1.1 and XHTML+RDFa 1.1

Back in August 2010 in a post entitled New W3C Document Standards for XHTML and RDFa I described the latest release of RDFa Core 1.1 and XHTML+RDFa1.1 draft documents. The RDFa Working Group has now published Last Call Working Drafts of these documents: RDFa Core 1.1 and XHTML+RDFa 1.1.

New Provenance Working Group

The W3C has also recently launched a new Provenance Working Group whose mission is “to support the widespread publication and use of provenance information of Web documents, data, and resources“. The Working Group will publish W3C Recommendations that define a language for exchanging provenance information among applications. This is an area of work which is likely to be of interest to those involved in digital library development work – and it is interesting to see that a workshop on Understanding Provenance and Linked Open Data was held recently at the University of Edinburgh.

Emotion Markup Language

When I first read of the Multimodal Interaction (MMI) Working Group‘s announcement of the Last Call Working Draft of Emotion Markup Language (EmotionML) 1.0. I checked to see that it hadn’t been published on 1 April! It seems that “As the web is becoming ubiquitous, interactive, and multimodal, technology needs to deal increasingly with human factors, including emotions“.

The EmotionML Language allows various vocabularies to be used such as:

The six terms proposed by Paul Ekman (Ekman, 1972, p. 251-252) as basic emotions with universal facial expressions — emotions that are recognized and produced in all human cultures: anger; disgust; fear; happiness; sadness and surprise.

The 17 terms found in a study by Cowie et al (Cowie et al., 1999) who investigated emotions that frequently occur in everyday life: affectionate; afraid; amused; angry; bored; confident; content; disappointed; excited; happy; interested; loving; pleased; relaxed; sad; satisfied and
worried.

Mehrabian proposal of a three-dimensional description of emotion in terms of Pleasure, Arousal, and Dominance.

Posted in HTML, standards, W3C | 1 Comment »

Standards for Web Applications on Mobile Devices: the (Re)birth of SVG?

Posted by Brian Kelly on 1 March 2011

The W3C have recently published a document entitled “Standards for Web Applications on Mobile: February 2011 current state and roadmap“. The document, which describes work carried out by the EU-funded Mobile Web Applications project, begins:

Web technologies have become powerful enough that they are used to build full-featured applications; this has been true for many years in the desktop and laptop computer realm, but is increasingly so on mobile devices as well.

This document summarizes the various technologies developed in W3C that increases the power of Web applications, and how they apply more specifically to the mobile context, as of February 2011.

The document continues with a warning:

This document is the first version of this overview of mobile Web applications technologies, and represents a best-effort of his author; the data in this report have not received wide-review and should be used with caution

The first area described in this document is Graphics and since the first standard mentioned in SVG the note of caution needs to be borne in mind.  As discussed in a post published in November 2008 on “Why Did SMIL and SVG Fail?” SVG (together with SMIL) failed to live up to their initial expectations.  The post outlined some reasons for this and in the comments there were suggestions that the standard hasn’t failed as it is now supported in most widely-used browsers, with the notable exception of Internet Explorer.  In January 2010 I asked “Will The SVG Standard Come Back to Life?” following the announcement that “Microsoft Joins W3C SVG Working Group“ and an expectation that IE9 will provide support for SVG. This was subsequently confirmed in a post with the unambiguous title “SVG in IE9 Roadmap” published on the IE9 blog.

The signs in the desktop browser environments are looking positive for support for SVG.  But it may be the mobile environment in which SVG really takes off, since on the desktop Web environment we have over 15 years of experiences in using HTML and CSS  to provide user interfaces. But as described in in the W3C Roadmap:

SVG, Scalable Vector Graphics, provides an XML-based markup language to describe two-dimensions vectorial graphics. Since these graphics are described as a set of geometric shapes, they can be zoomed at the user request, which makes them well-suited to create graphics on mobile devices where screen space is limited. They can also be easily animated, enabling the creation of very advanced and slick user interfaces.

But will SVG’s strength in the mobile environment lead to a fragmented Web in which mobile users engage with an SVG  environment whilst desktop users continue to access HTML resources?  I can recall  suggestions that where being made about 10 years ago which pointed out that since SVG is the richer environment it could be used as a generic environment.  Might we see that happening?  After all, as can be seen (if you’re using a browser which supports SVG) from examples such as the Solitaire game (linked in from the Startpagina Web site which provides access to various examples of SVG uses) it is possible to provide a SVG gaming environment. Might we see Web sites like this being developed?

Perhaps rather than the question “Has SVG failed?” we may soon need to start asking “How such we use SVG?

Posted in standards, W3C | Tagged: | 1 Comment »

HTML5 Standardisation Last Call – May 2011

Posted by Brian Kelly on 15 February 2011

I recently described the confusion over the standardisation of HTML5, with the WhatWG announcing that they are renaming HTML5 as ‘HTML’ and that it will be a ‘Living Standard’ which will continually evolve as browser vendors agree on new features to implement in the language.

It now seems that the W3C are responding to accusations that they are a slow-moving standardisatioin body with an announcement thatW3C Confirms May 2011 for HTML5 Last Call, Targets 2014 for HTML5 Standard“.  In the press release Jeff Jaffe, W3C CEO, states that:

Even as innovation continues, advancing HTML5 to Recommendation provides the entire Web ecosystem with a stable, tested, interoperable standard

I welcome this announcement as I feel that it helps to address recent uncertainties regarding the governance and roadmap for HTML developments.  The onus is now on institutions: there is now a clear roadmap for HTML5 development with a stable standard currently being finalised.  As providers of institutional Web services, what are you plans for deployment of HTML5?

Posted in standards, W3C | Tagged: | 1 Comment »

The W3C’s RDF and Other Working Groups

Posted by Brian Kelly on 14 February 2011

The W3C have recently announced the launch of the RDF Working Group.  As described in the RDF Working Group Charter:

The mission of the RDF Working Group, part of the Semantic Web Activity, is to update the 2004 version of the Resource Description Framework (RDF) Recommendation. The scope of work is to extend RDF to include some of the features that the community has identified as both desirable and important for interoperability based on experience with the 2004 version of the standard, but without having a negative effect on existing deployment efforts.

Membership of W3C working group comprises W3C staff as well as W3C member organisations, which includes the JISC. In addition it is also possible to contact working group chairs and W3C team members in order to explore the possibility of participation as an invited expert.

Note that a list of W3C Working Groups, Interest groups, Incubator Groups and Coordination Groups is provided on the W3C Web site. The Working Groups are typically responsible for the development of new W3C standards (known as ‘recommendations’) or the maintenance of existing recommendations. There are quite a number of working groups. including working groups for well-known W3C areas of work such as HTML, CSS and WAI as well as newer or more specialised groups covering areas including Geolocation, SPARQL, RDF and RDFa.

W3C Interest Groups which may be of interest include Semantic Web, eGovernment and WAI. Similarly Incubator Groups which may be of interest to readers of this blog include the Federated Social Web, Library Linked Data, the Open Web Education Alliance and the WebID groups.

The W3C Process Document provides details of the working practices for Working Groups, Interest Groups and Incubator Groups. If anyone feels they would like to contribute to such groups I suggest you read the Process Document in order to understand the level of committment which may be expected and, if you feel you can contribute to the work of a group, feel free to contact me.

Posted in standards, W3C | Leave a Comment »

The HTML5 Standardisation Journey Won’t Be Easy

Posted by Brian Kelly on 3 February 2011

I recently published a post on Further HTML5 Developments in which I described how the W3C were being supportive of approaches to the promotion of HTML5 and the Open Web Platform. However in a post entitled  HTML is the new HTML5 published on 19th January 2011 on the WhatWG blog Ian Hickson, editor of the HTML5 specification (and graduate of the University of Bath who now works for Google) announced that “The HTML specification will henceforth just be known as ‘HTML'”. As described in the FAQ it is intended that HTML5 will be a “living standard:

… standards that are continuously updated as they receive feedback, either from Web designers, browser vendors, tool vendors, or indeed any other interested party. It also means that new features get added to them over time, at a rate intended to keep the specifications a little ahead of the implementations but not so far ahead that the implementations give up.

What this means for the HTML5 marketing activities is unclear. But, perhaps more worrying is what this will mean for the formal standardisation process which W3C has been involved in.  Since it seems that new HTML(5) features can be implemented by browser and tool vendors this seems to herald a return to the days of the browser wars, during which Netscape and Microsoft introduced ‘innovative’ features such as the BLINK and MARQEE tags.

On the W3C’s public-html list Joshue O Connor (a member of the W3C WAI Protocol and Formats Working Group) feels that:

What this move effectively means is that HTML (5) will be implemented in a piecemeal manner, with vendors (browser manufacturers/AT makers etc) cherry picking the parts that they want. … This current move by the WHATWG, will mean that discussions that have been going on about how best to implement accessibility features in HTML 5 could well become redundant, or unfinished or maybe never even implemented at all.

In response Anne van Kesteren of Opera points out that:

Browsers have always implemented standards piecemeal because implementing them completely is simply not doable. I do not think that accepting reality will actually change reality though. That would be kind of weird. We still want to implement the features.

and goes on to add:

Specifications have been in flux forever. The WHATWG HTML standard since 2004. This has not stopped browsers implementing features from it. E.g. Opera shipped Web Forms 2.0 before it was ready and has since made major changes to it. Gecko experimented with storage APIs before they were ready, etc. Specifications do not influence such decisions.

Just over a year ago a CETIS meeting on The Future of Interoperability and Standards in Education explored “the role of informal specification communities in rapidly developing, implementing and testing specifications in an open process before submission to more formal, possibly closed, standards bodies“. But while the value of rapid development, implementation and testing was felt to be valuable there was a recognition of the continued need for the more formal standardisation process.  Perhaps the importance of rapid development which was highlighted at the CETIS event has been demonstrated by the developments centred around HTML5, with the W3C providing snapshots once the implementation and testing of new HTML developments have taken place, but I feel uneasy at the developments. This unease has much to do with the apparent autonomy of browser vendors: I have mentioned comments from employees of Google and Opera who seem to be endorsing this move (how would we feel if it was Microsoft which was challenging the W3C’s  standardisation process?). But perhaps we should accept that significant Web developments are no longer being driven by a standards organisation or from grass-roots developments but from the major global players in the market-place? Doesn’t sound good, does it – a twenty-first century return to browser vendors introducing updated versions of BLINK and MARQUEE elements as they’ll know what users want :-(

Posted in HTML, standards, W3C | Tagged: | 3 Comments »

WAI-ARIA 1.0 Candidate Recommendation – Request for Implementation Experiences and Feedback

Posted by Brian Kelly on 2 February 2011

W3C announced the publication of WAI-ARIA 1.0 as a W3C Candidate Recommendation on 18th January 2011. A Candidate Recommendation (CR) is a major step in the W3C standards development process which signals that there is broad consensus in the Working Group and among public reviewers on the technical content of proposed recommendation. The primary purpose of the CR stage is to implement and test WAI-ARIA. If you are interested in helping or have additional comments you are invited to follow the content submission instructions.

WAI-ARIA is a technical specification that defines a way to make Web content and Web applications more accessible to people with disabilities. It especially helps with dynamic content and advanced user interface controls developed with AJAX, HTML, JavaScript and related technologies. For an introduction to the WAI-ARIA suite please see the WAI-ARIA Overview or the WAI-ARIA FAQ.

It does occur to me that in light of the significant development work we are seeing in areas such as repositories, e-learning systems, e-research, etc. there may be examples of developments which have enhanced the user interface in ways which enhance access for users with disabilities. If you have made use of WAI-ARIA 1.0 techniques in the development of your services, as mentioned on the W3C blog, W3C WAI would welcome such feedback. Please note that the closing date for comments is 25th February 2011.

Posted in Accessibility, standards, W3C | Leave a Comment »

Call for Use Cases: Social Uses and Other New Uses of Library Linked Data

Posted by Brian Kelly on 21 January 2011

The W3C’s Library Linked Data Incubator Group has issued a “Call for Use Cases: Social uses and other new uses of Library Linked Data“. The call begins:

Do you use library-related data — like reading lists, library materials (articles, books, videos, cultural heritage or archival materials, etc), bookmarks, or annotations — on the Web and mobile Web?

Are you currently using social features in library-related information systems or sites, or plan to do so in the near future? We are particularly interested in uses that are related to or could benefit from the use of linked data.

The W3C Library Linked Data Incubator Group is soliciting SOCIAL and EMERGENT use cases for library-related linked data:

  • What new or innovative uses do you see (or envision) integrating library and cultural heritage data into applications on the Web and in social media?
  • How are social features used in library-related information systems?
  • What are the emergent uses of library-related data on the Web and mobile Web?

How could linked data technology [1]:

  • enhance the use of library-related data in a social context?
  • contribute to systems for sharing, filtering, recommending, or machine reading?
  • support new uses we may not have envisioned or achieved yet?

Some examples have been discussed in this thread [4].

Please tell us more by filling in the questionnaire below and sending it back to us or to public-lld@w3.org, preferably before February 15th, 2011 (note the original email incorrectly had 2010).

The information you provide will be influential in guiding the activities the Library Linked Data Incubator Group will undertake to help increase global interoperability of library data on the Web. The information you provide will be curated and published on the group wikispace at [3].

We understand that your time is precious, so please don’t feel you have to answer every question. Some sections of the templates are clearly marked as optional. However, the more information you can provide, the easier it will be for the Incubator Group to understand your case. And, of course, please do not hesitate to contact us if you have any trouble answering our questions.

Editorial guidance on specific points is provided at [2], and examples are available at [3].

The message then goes on to provide the template for the use cases.

I would think that there be a range of relevant examples of such use cases based on institutional developments and  JISC-funded project and service developments. It would be very useful, I feel, if the UK higher education sector were to contribute to this call as this can help to ensure that W3C’s Linked Data work will be informed by the experiences and requires of our sector.  I should add that I have come across examples of standardisation activities in the past which have reflected US approaches and which do are not easily implemented for UK working practices.

If you are involved in such Library-related Linked Data activities I would encourage you to read the original requests and respond accordingly.  Feel free to leave a comment here if you have contributed a use case.

Posted in Linked Data, W3C | Leave a Comment »

Moves Away From XML to JSON?

Posted by Brian Kelly on 26 November 2010

Although in the past I have described standards developed by the W3C which have failed to set the marketplace alight I have always regarded XML as a successful example of a W3C standard.  Part of its initial success was its simplicity – I recall hearing the story of when XML 1.0 was first published, with a copy of the spec being thrown into the audience to much laughter. The reason for the audience’s response? The 10 page (?) spec fluttered gently towards the audience but the SGML specification, for which XML provided a lightweight and Web-friendly alternative, would have crushed people sitting in the first few rows!   I don’t know whether this story is actually true but it provided a vivid way of communicating the simplicity of the standard which, it was felt, would be important in ensuring the standard would gain momentum and widespread adoption.

But where are we now, 12 years after the XML 1.0 specification was published? Has XML been successful in providing a universal markup language for use in not only a variety of document formats but also in protocols?

The answer to this question is, I feel, no longer as clear as it used to be.  In a post on the Digital Bazaaar blog entitled Web Services: JSON vs XML Manu Sporny, Digital Bazaar’s Founder and CEO, makes the case for the ‘inherent simplicity of JSON, arguing that:

XML is more complex than necessary for Web Services. By default, XML requires you to use complex features that many Web Services do not need to be successful.

The context to discussions in the blogosphere over XML vs JSON is the news that Twitter and Foursquare have recently removed XML support from their Web APIs and now support only JSON.  James Clark, in a post on XML vs the Web, appears somewhat ambivalent about this debate (“my reaction to JSON is a combination of ‘Yay’ and ‘Sigh‘”) but goes on to list many advantages of JSON over XML in a Web context:

… for important use cases JSON is dramatically better than XML. In particular, JSON shines as a programming language-independent representation of typical programming language data structures.  This is an incredibly important use case and it would be hard to overstate how appallingly bad XML is for this.

The post concludes:

So what’s the way forward? I think the Web community has spoken, and it’s clear that what it wants is HTML5, JavaScript and JSON. XML isn’t going away but I see it being less and less a Web technology; it won’t be something that you send over the wire on the public Web, but just one of many technologies that are used on the server to manage and generate what you do send over the wire.

The debate continues on both of these blogs.  But rather than engaging in the finer points of the debates of the merits of these two approaches I feel it is important to be aware of decisions which have already been taken.   And as Manu Sporny has pointed out:

Twitter and Foursquare had already spent the development effort to build out their XML Web Services, people weren’t using them, so they decided to remove them.

Meanwhile in a post on Deprecating XML Norman Walsh responds with the comment “Meh” -though he more helpfully expands in this reaction by concluding:

I’ll continue to model the full and rich complexity of data that crosses my path with XML, and bring a broad arsenal of powerful tools to bear when I need to process it, easily and efficiently extracting value from all of its richness. I’ll send JSON to the browser when it’s convenient and I’ll map the the output of JSON web APIs into XML when it’s convenient.

Is this a pragmatic approach which would be shared by developers in the JISC community, I wonder? Indeed on Twitter Tony Hirst has just askedCould a move to json make Linked Data more palatable to developers?” and encouraged the #jiscri and #devcsi communities to read a draft document on “JSON-LD – Linked Data Expression in JSON“.

Posted in jiscobs, standards, W3C | 9 Comments »

W3C and ISO

Posted by Brian Kelly on 9 November 2010

The World Wide Web Consortium (W3C) describes itself as “an international community where Member organizations, a full-time staff, and the public work together to develop Web standards“.  But surprisingly the W3C doesn’t actually produce standards. RatherW3C develops technical specifications and guidelines through a process designed to maximize consensus about the content of a technical report, to ensure high technical and editorial quality, and to earn endorsement by W3C and the broader community.

But this is now changing.  The W3C recently announed that “Global Adoption of W3C Standards [is] Boosted by ISO/IEC Official Recognition“.  The announcement describes how “the International Standards Organization (ISO), and the International Electrotechnical Commission (IEC) took steps that will encourage greater international adoption of W3C standards. W3C is now an ‘ISO/IEC JTC 1 PAS Submitter’ bringing ‘de jure’ standards communities closer to the Internet ecosystem.

What this means is that the W3C can submit their specifications directly for country voting to become ISO/IEC standards. The aims are to help avoid global market fragmentation;  to improve deployment within government use of W3C specifications and acceptance of a W3C specification when there is evidence of stability/market acceptance of the specification.

In their submission the W3C provided an overview of how they standardise a Web technology:

  1. W3C participants, members usually generate interest in a particular topic.
    W3C usually runs open workshops (events with a open call for papers) to identify new areas of work.
  2. When there is enough interest in a topic (e.g., after a successful Workshop and/or discussion on an Advisory Committee mailing list), the Director announces the development of a proposal for a new Activity or Working Group charter, depending on the breadth of the topic of interest.
    An Activity Proposal describes the scope, duration, and other characteristics of the intended work, and includes the charters of one or more groups (with requirements, deliverables, liaisons, etc) to carry out the work.
  3. When there is support within W3C for investing resources in the topic of interest, the Director approves the new Activity and groups get down to work.
    There are three types of Working Group participants: Member representatives, Invited Experts, and Team representatives. Team representatives both contribute to the technical work and help ensure the group’s proper integration with the rest of W3C.
  4. Working Groups create specifications based on consensus that undergo cycles of revision and review as they advance to W3C Recommendation status.
    The W3C process for producing specification includes significant review by the Members and public (every 3 months all drafts have to be made public on our Web site w3.org), and requirements that the Working Group be able to show implementation and interoperability experience.
  5. At the end of the process, the Advisory Committee (all members) reviews the mature specification, and if there is support, W3C publishes it as a Final Recommendation.
  6. The document enters what is called Life-after-Recommendation where the group/committee does maintenance, collects and publishes errata, considers minor changes, and if the technology is still evolving, prepares the next major version.

The W3C have not yet defined the selection criteria for identifying which specifications suitable for submission. I think it will be interesting to see how the market acceptance criteria will be used.  It will also be interesting to see what the timescales for such standardisation processes will be and whether the standardisation will be applied to recent W3C specification or older ones.  It seems, for example, that the ISO/IEC 15445:2000 standard for Information technology — Document description and processing languages — HyperText Markup Language (HTML) , which was first published in 2000 and updated in 2003, is the ISO standardisation of the HTML 4.0 specification. We can safely say that HTML 4 does have market acceptance, but the market place has  moved on with developers now interested in the HTML5 specification. Will the ISO standardisation take place several years after a standard has become ubiquitous, I wonder?

Posted in standards, W3C | 2 Comments »

Eight Updated HTML5 Drafts and the ‘Open Web Platform’

Posted by Brian Kelly on 4 November 2010

Eight Updated HTML5 Drafts

Last week the W3C announced “Eight HTML5 Drafts Updated”.  The HTML Working Group has published eight documents all of which were released on 19 October 2010:

Meanwhile on the W3C blog Philippe Le Hégaret has published a post on “HTML5: The jewel in the Open Web Platform” in which he describes how he has been “inspired by the enthusiasm for the suite of technical standards that make up what W3C calls the ‘Open Web Platform’“.

The ‘Open Web Platform’

The term ‘Open Web Platform’ seems strange, especially coming from a W3C employee. After all, has the Web always been based on an open platform since it was first launched, with open standards and open source client and server tools?

Philippe Le Hégaret goes on to say that Open Web Platform is “HTML5, a game-changing suite of tools that incorporates SVG, CSS and other standards that are in various stages of development and implementation by the community at W3C”.

Philippe described these ideas in a video on “The Next Open Web Platform” published in January 2010. From the transcript is seems that W3C are endorsing the characterisations of  “Web 1.0,  which provided a “very passive user experience“,  followed by “Web 2.0″ which provided “a more interactive user experience“.

The W3C, it seems, have announced that they are now “pushing the web in two areas, which are orthogonals. One is the Web of Data, that we refer to, of course, the Semantic Web, cloud computings that we are also interested in and mash-ups, data integration in general. And the other one is the Web of Interaction“.

Discussion

Whilst the W3C have always been prolific in publishing technical standards they have, I feel, been relatively unsuccessful in marketing their vision. It was the commercial sector which coined the term ‘Web 2.0′ – a term which had many detractors in the developer community, who showed their distaste by describing it as “a mere marketing term“.

Web 2.0 is marketing term – and a very successful marketing term, which also spun off other 2.0 memes.  So I find it interesting to observe that the W3C are now pro-active in the marketing of their new technical vision, centred around HTML5 and other presentational standards under the term ‘Open Web Platform’.

And alongside the ‘Open Web Platform W3C are continuing to promote what  they continue to describe as the ‘Semantic Web’.  But will this turn out to be a positive brand?  Over time we have seen the lower case semantic web, the pragmatic Semantic Web,  the Web of Data and Linked Data being used as a marketing term (with various degrees of technical characterisations).    But will the variety of terms which have been used result in confusion?  Looking at a Google Trend comparison of the terms “Semantic Web” and “Open Web Platform” we see a decrease in searches for “Semantic Web” since 2004, whilst there is not yet sufficient data to show the trends for the “Open Web Platform“.

Whilst I, like Philippe Le Hégaret, am also an enthusiast for the ‘Open Web Platform’ (who, after all, could fail to support a vision of an open Web?)  there is still a need to appreciate concerns and limitations and understand benefits before making decisions on significant uses of the standards which comprise the Open Web Platform. I will be exploring such issues in future posts – and welcome comments from others with an interest in this area.

Posted in jiscobs, standards, W3C | 2 Comments »

Proposed Recommendation for Mobile Web Application Best Practices

Posted by Brian Kelly on 2 November 2010

The W3C have recently published a Proposed Recommendation of Mobile Web Application Best Practices.  This document aims to “aid the development of rich and dynamic mobile Web applications [by] collecting the most relevant engineering practices, promoting those that enable a better user experience and warning against those that are considered harmful“.

The closing dates for comments for this document is 19 November 2010.

There is much interest in mobile Web applications within the UK Higher education sector, as can be seen from recent events such as the Eduserv Symposium 2010: The Mobile University, the FOTE10 conference and the Mobile Technologies sessions at UKOLN’s IWMW 2010 event.  Much of the technical discussions which are taking place will address such best practices.  Since an effective way of ensuring that best practices can be embedded is to publish such practices by a well-known and highly regarded standards body I feel it would be useful if those who are involved in mobile Web development work were to review this document and provided feedback.

The Mobile Web Best Practices Working Group provides details on how to give feedback, through use of the public-bpwg-comments@w3.org mailing list. Note that an archive of the list is available.

 

Posted in W3C | 1 Comment »

Release of MathML v3 as a W3C Standard

Posted by Brian Kelly on 29 October 2010

On 21 October 2010 the W3C made an announcement about an “important standard for making mathematics on the Web more accessible and international, especially for early mathematics education“. The press release described how “MathML 3 is the third version of a standard supported in a wide variety of applications including Web pages, e-books, equation editors, publishing systems, screen readers (that read aloud the information on a page) and braille displays, ink input devices, e-learning and computational software.”

But what about support from browser vendors?  The press release went on to describe how “MathML 3 is part of W3C’s Open Web Platform, which includes HTML5, CSS, and SVG. Browser vendors will add MathML 3 support as they expand their support for HTML5. Firefox and Camino already support MathML 2 natively, and Safari/WebKit nightly builds continue to improve. Opera supports the MathML for CSS profile of MathML 3. Internet Explorer users can install a freely-available MathPlayer plug-in. In addition, JavaScript software such as MathJax enables MathML display in most browsers without native support.

Does it work? In order to investigate I installed the Firemath extension for FireFox and the MathPlayer plugin for Internet Explorer.  I then viewed the MathML Browser Test (Presentation Markup) page using FireFox (v 4.0), Chrome, Internet Explorer (v 8) and Opera (v 10.61). The results shown using Internet Explorer version 8 are shown below, with the first and second columns containing an image of how the markup has been rendered in TeXShop and FireFox with STIK Beta Fonts and the third column showing how the markup is rendered in the browser the user is using.

A quick glance at the display on all four browsers shows that the support seems pretty good [Note following a commented I received I have noticed that the page isn’t rendered in Chrome) – added 2 November 2010].  However it would take a  mathematician to ensure that the renderings of mathematical formula are acceptable.

It should also be noted that MathML 3 is part of HTML5. This means that embedding maths in Web documents should become easier, with direct import from HTML to mathematics software and vice versa.

In order to encourage takeup the W3C Math home page provides links to “A Gentle Introduction to MathML” and “MathML: Presenting and Capturing Mathematics for the Web” tutorials with “The MathML Handbook” available for purchase.

The W3C have provided a “MathML software list” together with a “MathML 3 Implementation Testing Results Summary” – which, it should be noted, has not not been updated since July 2010.

I think this announcement is of interest in the context of institutional planning for migration of document formats to richer and more open environments provided by HTML5 and associated standards such as MathML, CSS 3. etc.

Will we start to see documents containing MathML markup being uploaded to institutional repositories, I wonder? And should this format be preferred to PDFs for scientific papers containing mathematical markup?

Posted in jiscobs, standards, W3C | Tagged: | 8 Comments »

RDFa API Draft Published

Posted by Brian Kelly on 28 September 2010

The W3C have recently announced that the RDFa API draft has been published. As described in the announcement “RDFa enables authors to publish structured information that is both human- and machine-readable. Concepts that have traditionally been difficult for machines to detect, like people, places, events, music, movies, and recipes, are now easily marked up in Web documents“.

The RDFa API draft document itself helpfully provides several examples which illustrate the potential benefits of use of RDFa:

Enhanced Browser Interfaces: Dave is writing a browser plugin that filters product offers in a web page and displays an icon to buy the product or save it to a public wishlist. The plugin searches for any mention of product names, thumbnails, and offered prices. The information is listed in the URL bar as an icon, and upon clicking the icon, displayed in a sidebar in the browser. He can then add each item to a list that is managed by the browser plugin and published on a wishlist website.

Data-based Web Page Modification: Dale has a site that contains a number of images, showcasing his photography. He has already used RDFa to add licensing information about the images to his pages, following the instructions provided by Creative Commons. Dale would like to display the correct Creative Commons icons for each image so that people will be able to quickly determine which licenses apply to each image.

Automatic Summaries: Mary is responsible for keeping the projects section of her company’s home page up-to-date. She wants to display info-boxes that summarize details about the members associated with each project. The information should appear when hovering the mouse over the link to each member’s homepage. Since each member’s homepage is annotated with RDFa, Mary writes a script that requests the page’s content and extracts necessary information via the RDFa API.

Data Visualisation: Richard has created a site that lists his favourite restaurants and their locations. He doesn’t want to generate code specific to the various mapping services on the Web. Instead of creating specific markup for Yahoo Maps, Google Maps, MapQuest, and Google Earth, he instead adds address information via RDFa to each restaurant entry. This enables him to build on top of the structured data in the page as well as letting visitors to the site use the same data to create innovative new applications based on the address information in the page.

Linked Data Mashups: Marie is a chemist, researching the effects of ethanol on the spatial orientation of animals. She writes about her research on her blog and often makes references to chemical compounds. She would like any reference to these compounds to automatically have a picture of the compound’s structure shown as a tooltip, and a link to the compound’s entry on the National Center for Biotechnology Information [NCBI] Web site. Similarly, she would like visitors to be able to visualize the chemical compound in the page using a new HTML5 canvas widget she has found on the web that combines data from different chemistry websites.

However the example I find most interesting is the following:

Importing Data: Amy has enriched her band’s web-site to include Google Rich Snippets event information. Google Rich Snippets are used to mark up information for the search engine to use when displaying enhanced search results. Amy also uses some ECMAScript code that she found on the web that automatically extracts the event information from a page and adds an entry into a personal calendar.

Brian finds Amy’s web-site through Google and opens the band’s page. He decides that he wants to go to the next concert. Brian is able to add the details to his calendar by clicking on the link that is automatically generated by the ECMAScript tool. The ECMAScript extracts the RDFa from the web page and places the event into Brian’s personal calendaring software – Google Calendar.

Although all of the use cases listed above provide sample RDFa markup the final example makes use of Google Rich Snippets for which there is a testing tool which illustrates the structure which is visible to Google. I have been using RDFa on my forthcoming events page for a while so using the Rich Snippets testing tool it is useful to see how the structure provided on that page is processed by Google.

The testing tool does point out that “that there is no guarantee that a Rich Snippet will be shown for this page on actual search results“. As described in the Rich Snippets FAQCurrently, review sites and social networking/people profile sites are eligible. We plan to expand Rich Snippets to other types of content in the future“.

So although there is no guarantee that use of RDFa embedded in HTML pages using Google Rich Snippets for, say, events will ensure that search results for an events hosted on your Web site will provide a structured display of the information like this:

the fact that Google’s Rich Snippets are explicitly mentioned in the RDFa API draft document does seem to suggest commitment from a leading player which has a vested interest in processing structured information in order to improve the searching process.

And of course the “ECMAScript code that [Amy] found on the web that automatically extracts the event information from a page and adds an entry into a personal calendar” suggests that such RDFa information can be processed today without the need for support from Google.  Now does anyone know where Amy found this ECMAScript code?

Posted in Linked Data, W3C | 3 Comments »

URI Interface to W3C’s Unicorn Validator

Posted by Brian Kelly on 23 September 2010

The W3C recently announced that they had launched Unicorn, which they described as “a one-stop tool to help people improve the quality of their Web pages. Unicorn combines a number of popular tools in a single, easy interface, including the Markup validator, CSS validator, mobileOk checker, and Feed validator“.

Output from UnicornAn example of how this validation service works is illustrated, which is based on validation of the UKOLN home page.

The  default options provide validation of the HTML and CSS of the selected page together with any auto-discoverable RSS feeds.

The interface to the validator is a Web form hosted on the W3C Web site.

But encouraging use of such validation services would be much easier if the interface was more closely integrated with am author’s browsing environment, so that they didn’t have to visit an other page and copy and paste a URL.

The UKOLN Web site has been configured to provide this ease-of-use. Appending ,unicorn to the UKOLN home page will invoke the Unicorn validator – and this option can be used on any page on the UKOLN Web site.

This service is implemented by adding the following line to the Apache Web server’s configuration file:

RewriteRule /(.*),unicorn http://validator.w3.org/unicorn/check? ucn_uri =http://%{HTTP_HOST}/$1&ucn_task=conformance# [R=301]

I’m not sure how easy it may be to implement such extensions to Web servers these days; there may be policy barriers to such changes or perhaps technical barriers imposed by Content Management Systems.  But I wonder if this simple approach might be of interest to others?

Posted in HTML, standards, W3C | 1 Comment »

An Early Example of a TTML Application

Posted by Brian Kelly on 16 September 2010

Back in February 2010 the W3C announced a Candidate Recommendation Updated for Timed Text Markup Language (TTML) 1.0. This article referred to work being carried out by the W3C’s Timed Text Working Group which had been asked to produce a W3C Recommendation for media online captioning by refining the W3C specification Timed Text Markup Language (TTML) 1.0 based on implementation experience and interoperability feedback.

This work is now complete with version 1.0 of the Timed Text Markup Language (TTML) 1.0 Proposed Recommendation having being published on 14 September 2010.

Martin Hawksey’s iTitle Twitter captioning tool was an early example of an application which has exploited this emerging new standard. As described in the Twitter subtitling article in Wikipedia Martin “created a subtitle file from tweets in W3C Timed Text Markup Language (TTML) which could be used with the BBC iPlayer“. This example was initially used to provide Twitter captioning of the BBC/OU The Virtual Revolution programme followed by Gordon’s Browns talk on Building Britain’s Digital Future.

It’s good to see this example of a prototype service which takes a proposed standard and demonstrates its value.  Congratulations to Martin and RSC Scotland North and East.

I’d be interested, though, to speculate on what other possibilities time text markup language applications may have to offer. Any suggestions anyone?

Posted in standards, W3C | 1 Comment »

New W3C Document Standards for XHTML and RDFa

Posted by Brian Kelly on 27 August 2010

New W3C Draft Documents

The W3C have recently announced that new “Drafts of RDFa Core 1.1 and XHTML+RDFa 1.1 [have been] Published“. The announcement states that:

The RDFa Working Group has just published two Working Drafts: RDFa Core 1.1 and XHTML+RDFa 1.1. RDFa Core 1.1 is a specification for attributes to express structured data in any markup language. The embedded data already available in the markup language (e.g., XHTML) is reused by the RDFa markup, so that publishers don’t need to repeat significant data in the document content. XHTML+RDFa 1.1 is an XHTML family markup language. That extends the XHTML 1.1 markup language with the attributes defined in RDFa Core 1.1.

Meanwhile on 24th June 2010 the latest version of the “HTML5: A vocabulary and associated APIs for HTML and XHTML” working draft was published.

Patrick Lauke’s talk on “HTML5 (and friends): The future of web technologies – today” generated a lot of interest at the IWMW 2010 event – but as I pointed out in the workshop conclusions session, there seems to be some uncertainty as to whether the focus for those involved in the provision of institutional Web services should be on the user interface developments provided in HTML5 or in use of HTML as a contained for reusable (linked) data which RDFa aims to provide.

Of course for many the requirement will be to enhance the user interface (for human visitors) and provide access to machine readable data (for machines). The latter can be achieved in various ways but if you choose to go down the RDFa route a  question then is: “Can you embed RDFa in HTML5 documents and, of so, how do you do this?“.

The answer to this question is not (yet) clear.  The W3C have published a  “HTML5+RDFa: A mechanism for embedding RDF in HTML” working draft document – but this was released in July 2009 and hasn’t been updated since [Note that while this document on the dev.w3c.org Web site has not been updated or links to new versions provided, as described in a comment to this post a more recent document on HTML+RDFa 1.1: Support for RDFa in HTML4 and HTML5, dated 24 June 2010 is available – this comment added on 2 September 2010].

This document also states that:

Implementors should be aware that this specification is not stable. Implementors who are not taking part in the discussions are likely to find the specification changing out from under them in incompatible ways. Vendors interested in implementing this specification before it eventually reaches the Candidate Recommendation stage should join the aforementioned mailing lists and take part in the discussions.

But such caveats are also true of the RDFa Core 1.1 and XHTML+RDFa 1.1 draft documents, both of which state that:

This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress

In addition the HTML5 working draft states that:

Implementors should be aware that this specification is not stable. Implementors who are not taking part in the discussions are likely to find the specification changing out from under them in incompatible ways. Vendors interested in implementing this specification before it eventually reaches the Candidate Recommendation stage should join the aforementioned mailing lists and take part in the discussions.

Meanwhile the “HTML Microdata” working draft was also published on 10th August 2010, and this again states that:

Implementors should be aware that this specification is not stable. Implementors who are not taking part in the discussions are likely to find the specification changing out from under them in incompatible ways. Vendors interested in implementing this specification before it eventually reaches the Candidate Recommendation stage should join the aforementioned mailing lists and take part in the discussions.

Microdata is being proposed as an extension of microformats which addresses deficiencies in microformats without the added complexities of  RDFa.

What Does the Future Hold?

Should you start to migrate HTML documents from an existing HTML 4 or XHTML 1 environment to HTML5?  The advice given by Patrick Lauke in his talk, as reported by @iwmwlive, was “If you want to take advantage of the new features, go ahead with HTML5, but don’t rush off to recode if you don’t need it“.  But while much of the buzz surrounding the new features provided by HTML5 concern user interface developments (such as native support for video and  enhanced forms validation) the future regarding use of HTML as a container for data seems to be somewhat uncertain.

The best advice may be not to rush off to embed data in your HTML resource if you don’t need to.  But as such advice can be a barrier to innovation if needs to be qualified by the suggestion that if you do wish to embed data using RDFa, microdata of microformats, you should ensure that you do so using a management system which will enable you to change the format you use if you discover that you have selected an approach which fails to take off.  This advice is, of course, reflects the warning given in the draft documents – but not everyone reads such advice!

Posted in HTML, standards, W3C | 4 Comments »

Experiments With RDFa

Posted by Brian Kelly on 3 May 2010

The Context

In a recent post I outlined some thoughts on Microformats and RDFa: Adding Richer Structure To Your HTML Pages. I suggested that it might now be timely to evaluate the potential of RDFa, but added a note of caution, pointing out that microformats don’t appear to have lived up to their initial hype.

Such reservations were echoed by Owen Stephens who considered using RDFa (with the Bibo ontology) to enable sharing of ‘references’ between students (and staff) as part of his TELSTAR project and went on to describe the reasons behind this decisions. Owen’s decision centred around deployment concerns. In contrast Chris Gutteridge had ideological reservations, as he “hate[s] the mix of visual & data markup. Better to just have blocks of RDF (in N3 for preference) in an element next to the item being talked about, or just in the page“. Like me, Stephen Downes seems to be willing to investigate and asked for “links that would point specifically to an RDFa syntax used to describe events?“. Michael Hausenblas provided links to two useful resources: W3C’s Linked Data Tutorial – on Publishing and consuming linked data with RDFa and a paper on “Building Linked Data For Both Humans and Machines” (PDF format). Pete Johnson also gave some useful comments and provided a link to recently published work on how to use RDFa in HTML 5 resources.

My Experiments

Like Stephen Downes I thought it would be useful to begin by providing richer structure about events. My experiments therefore began by adding RDFa markup for my forthcoming events page.

As the benefits of providing such richer structure for processing by browser extensions appear to be currently unconvincing my focus was in providing such markup by a search engine. The motivation is therefore primarily to provide richer markup for events which will be processed by a widely-used service in order that end users will receive better search results.

My first port of call was a Google post which introduced rich snippets. Google launched their support for Rich Snippets less than a year ago, in May 2009. They are described as “a new presentation of snippets that applies Google’s algorithms to highlight structured data embedded in web pages“.

Documentation on the use of Rich Snippets is provided on Google’s Webmaster Tools Web site. This provides me with information on RDFa (together with microdata and microformats) markup for events. Additional pages provide similar information on markup about people and businesses and organisations.

Although I am aware that Google have been criticised for developed their own vocabulary for their Rich Snippets I was more interested in carrying out a simple experiment with use of RDFa than continuing the debate on the most appropriate vocabularies.

The forthcoming events page was updated to contain RDFa markup about myself (name, organisation and location of my organisation, including the geo-location of the University of Bath.

For my talks in 2010 I replaced the microformats I have used previously with RDFa markup along the providing information on the date of the talks and their location (again with geo-location information).

No changes where noticeable when viewing the page normally. However using FireFox plugins which display RDFa (and microformat) information I can see that software is able to identify the more richly structured elements in the HTML page. The screenshot shows how the markup was rendered by the Operator sidebar and the RDFa Highlight bookmarklet and, in the status bar at the bottom of the screen, links to an RDFa validator and the SIOC RDF Browser.

Rendering of RDFa markup using various FireFox tools.

If you compare this image with the display of how microformats are rendered by the Operator plugin it will be noted that the display of microformats shows the title of the event whereas the display of RDFa lists the HTML elements which contain RDFa markup. The greater flexibility provided by RDFa appears to come at the price of a loss of context which is provided by the more constrained uses provided by microformats.

Is It Valid?

Although the HTML RDFa Highlight bookmarklet demonstrated that RDFa markup was available and indicated the elements to which the markup had been applied, there was also a need to modify other aspects of the HTML page. The DTD was changed from a HTML 1.0 Strict to:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">

If addition the namespace of the RDFa elements needed to be defined:

<html xmlns="http://www.w3.org/1999/xhtml"
  xmlns:cc="http://creativecommons.org/ns#"
  xmlns:v="http://rdf.data-vocabulary.org/#"
  xml:lang="en">

It was possible for me to do this as I have access to the HTML page including elements defined in the HTML . I am aware that some CMS applications may not allow such changes to be made and, in addition, organisations may have policies which prohibit such changes.

On subsequently validating the page I discovered, however, HTML validity errors. It seems that my use of name="foo" attribute has been replaced by id="foo".

The changes to the DTD and the elements and the inclusion of the RDFa markup weren’t the only changes I had to make, however. I discovered that the id="foo attribute requires "foo" to start with an alphabetic character. I therefore had to change id="2010" to id="year-2010". This, for me, was somewhat more worrying as rather than just including new or slightly modified markup which was backwards-compatible, I was now having to change the URL of an internal anchor. If the anchors had started with an alphabetic character this wouldn’t have been an issue (and I would have been unaware of the problem). However it seems that a migration from a document-centred XHTML 1.0 Strict conforming world to the more data-centric XHTML 1.1+RDFa world may result in links becoming broken. I was prepared to make this change on my pages of forthcoming and recent events and change links within the pages. However if others are linking to these internal anchors (which I think is unlikely) then the links with degrade slightly (they won’t result in the display of a 404 error message; instead the top of the page will be displayed, rather than the entries for the start of the particular year).

Google’s View of the RDFa Markup

Using Google’s Rich Snippets Testing Tool it is possible to “enter a web page URL to see how it may appear in search results“. The accompanying image shows the output of this tool for my events page.

Rendering of RDFa markup

This shows the structure of the page which Google knows about. As Google knows the latitude and longitude for the location of the talk it can use this for location based services and it can provide the summary of the event and a description.

Is It Correct?

Following my initial experiment my former colleague Pete Johnston (now of Eduserv) kindly gave me some feedback. He alerted me to W3C’s RDFa Distiller and Parser service – and has recently himself published posts on Document metadata using DC-HTML and using RDFa and RDFa 1.1 drafts available from W3C.

Using the Distiller and Parser service to report on my event page (which has now been updated) I found that I had applied a single v:Event XML element where I should have used three elements for the three events. I had also made a number of other mistakes when I made use of the examples fragments provided in the Google Rich Snippets example without having a sound understanding of the underlying model and how it should be applied. I hope the page is now not only valid but uses a correct data model for my data.

I should add that I am not alone in having created resources containing Linked data errors. A paper on “Weaving the Pedantic Web” (PDF format) presented at the Linked Data on the Web 2010 workshop described an analysis of almost 150,00 URIs which revealed a variety of errors related to accessing and dereferencing resources and processing and parsing the data found. The awareness of such problems has led to the establishment of the Pedantic Web Group which “understand[s] that the standards are complex and it’s hard to get things right” but nevertheless “want[s] you to fix your data“. There will be a similar need to avoid polluting RDFa space with incorrect data.

Is It Worthwhile?

The experiences with microformats would seem to indicate that benefits of use of RDFa will be gained if large scale search engines support its use, rather than providing such information with an expectation that there will be significant usage by client-side extensions.

However the Google Rich Snippets Tips and Tricks Knol page state that “Google does not guarantee that Rich Snippets will show up for search results from a particular site even if structured data is marked up and can be extracted successfully according to the testing tool“.

So, is it worth providing RDFa in your HTML pages? Perhaps if you have a CMS which creates RDFa or you can export existing event information in an automated way it would be worth adding the additional semantic markup. But you need to be aware of the dangers of doing this in order to enhance findability of resources by Google since Google may not process your markup. And, of course, there is no guarantee that Google will continue to support Rich Snippets. On the other hand other vendors, such as Yahoo!, do seem to have an interest in supporting RDFa – so potentially RDFa could provide a competitive advantage over other search engine providers.

But, as I discovered, it is easy to make mistakes when using RDFa. So there will be essential to have an automated process for the production of pages containing RDFa – and there will be a need to ensure that the data model is correct as well as the page being valid. This will require a new set of skills as such issues are not really relevant in standard HTML markup.

I wonder if I have convinced Owen Stephens and Chris Gutteridge who expressed their reservations about use of RDFa? And are there any examples of successful use of RDFa which people know about?

“RDFa from Theory to Practice” Workshop Session

Note that if you have an interest in applying the potential of RDFa in practice my colleagues Adrian Stevenson, Mark Dewey and Thom Bunting will be running a 90 minute workshop session on “RDFa from theory to practice” at this year’s IWMW 2010 event to be held at the University of Sheffield on 12-14 July.

Posted in HTML, W3C | Tagged: | 5 Comments »

Microformats and RDFa: Adding Richer Structure To Your HTML Pages

Posted by Brian Kelly on 25 March 2010

Revisiting Microformats

If you visit my presentations page you will see a HTML listing of the various talks I’ve given since I started working at UKOLN in 1996.  The image shown below gives a slightly different display from the one you will see, with use of a number of FireFox plugins providing additional ways of viewing and processing this information.

Firefox extensions

This page contains microformat information about the events.  It was at UKOLN’s IWMW 2006 event that we made use of microformats on the event Web site for the first time with microformats being used to mark up the HTML representation for the speakers and workshop facilitators together with the timings for the various sessions. At the event Phil Wilson ran a session on “Exposing yourself on the Web with Microformats!“. There was much interest in the potential of microformats back in 2006, which was then the hot new idea.  Since then I have continued to use microformats to provide richer structural information for my events and talks. I’ll now provide a summary of the ways in which the microformats can be used, based on the image shown above.

The Operator sidebar (labelled A in the image) shows the Operator FireFox plugin which “leverages microformats and other semantic data that are already available on many web pages to provide new ways to interact with web services“. The plugin detects various microformats embedded in a Web page and supports various actions – as illustrated, for events the date, time and location and summary of the event can be added to various services such as Google and Yahoo! Calendar.

The RDFa in Javascript bookmarklets (labelled B) are simple JavaScript tools which can be added to a variety of different browsers (they have been tested on IE 7,  Firefox, Safari, Mozilla and Safari). The License bookmarklets will create a pop-up alert showing the licence conditions for a page, where this has been provided in a structured form. UKOLN’s Cultural Heritage briefing documents are available under a Creative Commons licence. Looking at, for example, the Introduction to Microformats briefing document, you will see details of the licence conditions displayed for reading. However, in addition, a machine-readable summary of the licence conditions is also available which is processed by the Licence bookmarklet and displayed as a pop-up alert. This information is provided by using the following HTML markup:

<p><a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/2.0/">
<img src="http://creativecommons.org/images/public/somerights20.gif"
   alt="Creative Commons License" /></a>This work is licensed under a
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/2.0/">Creative Commons
License</a>.</p>

The power is in the rel=”license” attribute which assigns ‘meaning’ to the hypertext link.

The link to my Google Calendar for each of the events (labelled C) is provided by the Google hCalendar Greasemonkey script. Clicking on the Google Calendar icon (which is embedded in the Web page if hCalendar microformatting markup is detected – although I disable this feature if necessary) will allow the details to be added to my Google Calendar without me having to copy and paste the information.

The additional icons in the browser status bar (labelled D) appear to be intended for debugging of RDFa – and I haven’t yet found a use for them.

The floating RSS Panel (labelled E) is another GrreaseMonkey script. In this case the panel does not process microformats or RDFa but autodetectable links to RSS feeds. I’m mentioning it in this blog post in order to provide another example of how richer structure in HTML pages can provide benefits to an end user. In this case in provides a floating panel in which RSS content can be displayed.

RDFa – Beyond Microformats

The approaches I’ve described above date back to 2006, when microformats was the hot new idea.  But now there is more interests in technologies such as Linked Data and RDF. Those responsible for managing Web sites with an interest in emerging new ways of enhancing HTML pages are likely to have an interest in RDFa: a means of including RDF in HTML resources.

The RDFa Primer is sub-titled “Bridging the Human and Data Webs“. This sums up nicely what RDFa tries to achieve – it enables Web editors to provide HTML resources for viewing by humans whilst simultaneously providing access to structured data for processing by software.  Microformats provided an initial attempt at doing this, as I’ve shown above.  RDFa is positioning as providing similar functionality, but coexisting with developments in the Linked Data area.

The RDFa Primer provides some examples which illustrate a number of use cases.  My interest is in seeing ways in which RDFa might be used to support Web sites I am involved in building, including this year’s IWMW 2010 Web site.

The first example provided in the primer describes how RDFa can be used to describe how a Creative Commons licence can be applied to a Web page; an approach which I have described previously.

The primer goes on to describe how to provided structured and machine understandable contact information, this time using the FOAF (Friends of a Friend) vocabulary:

<div typeof="foaf:Person" xmlns:foaf="http://xmlns.com/foaf/0.1/">
   <p property="foaf:name">Alice Birpemswick</p>
   <p>Email: <a rel="foaf:mbox" href="mailto:alice@example.com">alice@example.com</a></p>
   <p>Phone: <a rel="foaf:phone" href="tel:+1-617-555-7332">+1 617.555.7332</a></p>
</div>

In previous year’s we have marked up contact information for the IWMW event’s program committee using hCard microformats. We might be in a position now to use RDFa. If we followed the example in the primer we might use RDFa to provide information about the friends of the organisers:

<div xmlns:foaf="http://xmlns.com/foaf/0.1/"> <ul> <li typeof="foaf:Person"> <a rel="foaf:homepage" href="http://example.com/bob/">Bob</a> </li> <li typeof="foaf:Person"> <a rel="foaf:homepage" href="http://example.com/eve/">Eve</a> </li> <li typeof="foaf:Person"> <a rel="foaf:homepage" href="http://example.com/menu/">Menu</a> </li> </ul></div>

However this would not be appropriate for an event. What would be useful would be to provide information on the host information for the speakers and workshop facilitators. In previous year’s such information has been provided in HTML, with no formal structure which would allow automated tools to process such institutional information.  If  RDFa was used to provide such information for the 13 years since the event was first launched this could allow an automated tool to process the event Web sites and provide various report on the affiliations of the speakers. We might be then have a mechanism for answering the query “Which institution has provided the highest number of (different) speakers or facilitators at IWMW events?“. I can remember that Phil Wilson, Andrew Male and Alison Kerwin (nee Wildish) from the University of bath have spoken at events, but who else? And what about the Universities which I am unfamiliar with?   This query could be solved if the data was stored in a backend database, but as the information is publicly available on the Web site, might not using slightly more structured content on the Web site be a better approach?

Really?

When we first started making use of microformats I envisaged that significant numbers of users would be using various tools on the browser to process such information.  However I don’t think this is the case (and I would like to hear from anybody who does make regular use of such tools).   I have to admit that although I have been providing microformats for my event information, I have not consumed microformats provided by others (and this includes the microformats provided on the events page on the JISC Web site).

This isn’t, however, necessarily an argument that microformats – or RDFa –  might not be useful.  It  may be that the prime use of such information is by server-side tools which harvest such information form a variety of sources. In May 2009, for example, Google announced that Google Search Now Supports Microformats and Adds “Rich Snippets” to Search Results. Yah0o’s SearchMonkey service also claims to support structured search queries.

But before investing time and energy into using RDFa across an event Web site the Web manager will need answers to the questions:

  • What benefits can this provide?  I’ve given one use case, but I’d be interested in hearing more.
  • What vocabularies do we need to use and how should the data be described? The RDFa Primer provides some example, but I am unsure as to how to use RDFa to state that, for example, Brian Kelly is based at the University of Bath, to enable structured searches of all speakers from the University of Bath.
  • What tools are available which can process the RDFa which we may chose to create?

Anyone have answers to these questions?

Posted in HTML, W3C | Tagged: , | 11 Comments »

Criteria for Successful and Unsuccessful Web Standards

Posted by Brian Kelly on 18 March 2010

Success and Failure Criteria for IETF Standards

As Erik Duval commented on my recent report on the CETIS Future of Interoperability Standards meeting, “it would be very useful to have more explicit criteria for the success (and, as pointed out in the meeting, for the failure!) of open standards“.

Coincidentally the IETF have recently set up a wiki which aims to summarise successful and unsuccessful IETF standards. The wiki page on Applications Area is the most interesting for me, although the IETF’s views on applications (MIME, IMAP, HTTP, etc.) differs from mine!

The table has a 5-point scale on usage (take-up):

++ :  became an essential capability
+  :  gained significant usefulness
0  :  outcome still pending
-  :  only gained minor usefulness
-- :  complete failure

>  :  prompted extensive derivative work (optional additional ranking)

MIME, POP3 and HTTP seem to be regarded as successful (ranked ++ or ++>) whereas Atom  is only ranked ‘+’ and AtomPub gets a ‘-‘.

In the Web area what might be regarded as the successful and unsuccessful standards? And how do we judge when a standard is successful or unsuccessful?

Success and Failure Criteria for Web Standards

In a previous post I asked “Why Did SMIL and SVG Fail?” These, then, are two standards developed by the W3C which have failed to live up to their expectations and my blog post suggests reasons for such failures. But what general criteria might be used for identifying successful and unsuccessful Web standards? My attempt to seek an answer to this question is to look at some of the standards themselves and to consider whether they might be regarded as successful or unsuccessful and use this as a means of identifying the appropriate criteria.

HTML is clearly a successful W3C standard.  It is widely deployed and has been widely accepted in the market place with a wide range of creation and viewing tools available, both as open source and licensed products.  The HTML standard has also evolved over time, with standards published for HTML 1, HTML 2, HTML 3.2, HTML 4 and XHTML 1, and the HTML 5 standard currently being developed. The XHTML 2.o proposed standard in contrast, illustrates a failed attempt to provide an alternative development path for HTML which addressed shortcomings in the original series of HTML standards by removing the need to provide backwards compatibility with existing standards and viewers.

Observations:  The benefits of simplicity and market acceptance can trump the technical elegance of alternative which do not have a clear roadmap for significant deployment.

CSS is another W3C standard which can be regarded as successful. Unlike HTML, however, it had a somewhat difficult birth, having to compete with presentations tags which became standardised in HTML 3.2 and the flawed support in browsers which were at the time widely deployed (e.g. early version of the Netscape Navigator browser). Despite ongoing support problems (which nowadays relate to versions of the Internet Explorer browser)  CSS is widely regarded as the most appropriate way of described ways in which HTML structural elements should be displayed in a Web browser.

Observations:  Despite an overlong gestation period, standards may eventually become widely accepted.

XML can be regarded as another successful W3C. Interestingly since the XML 1.0 specification was ratified in  February 1998 there have been four further editions which have addressed various shortcomings together with a release of XML 1.1. There have also been two edition of XML 1.1, which provides independence from specific Unicode versions. The W3C Web site states thatYou are encouraged to create or generate XML 1.0 documents if you do not need the new features in XML 1.1; XML Parsers are expected to understand both XML 1.0 and XML 1.1.“.

Obervations: Successful standards may be stable and not require regular developments to provide new features.

The RSS family of standards is somewhat confusing, with RSS having several meanings, with  RDF Site Summary and Really Simple Syndication describing RSS 1.0 and RSS 2.0, which are independently managed forks in the development of the syndication format developed by Netscape and known, at one stage, as Rich Site Summary. The IETF has developed a complementary standard known as Atom, which has attempted to address the confusions caused in the forking of the standard and the uncertainties related to the governance of RSS 1.0 and RSS 2.0.  Despite the confusions behind the scenes RSS is widely accepted as a stable and mature syndication standard, with RSS feeds being provided as standard by many blogging platforms.  RSS is also increasingly used by other applications and development environments, such as Yahoo Pipes!, provide environments for developers to process RSS feeds.

Observations: Despite confusions over the multiple versions and governance, the simplicity provided by RSS has been valuable in its success.

JavaScript was initially developed by Netscape’s. According to Wikipedia the name was chosen as “a marketing ploy by Netscape to give JavaScript the cachet of what was then the hot new web-programming language“). The benefits of a client-side language resulted in Microsoft developing a JavaScript dialect which they called JScript. Eventually JavaScript became standardised under the name ECMAScript, although this name tends not to be widely used. Although in its early days interoperability problems and the lack of support for JavaScript in assistive technologies resulted in professional Web developers tending to avoid use of JavaScript, the popularity of AJAX (Asynchronous JavaScript and XML) in Web 2.0 applications provided tangible usability benefits to end users. With developments such as ARIA which enable usability benefits to be made available to users of assistive technologies we can now regard JavaScript as being a successful standard for the development of usable and interactive Web services.

Observations: Although competition between software vendors may initially result in interoperability problems, such competition may also help to demonstrate that there is a role in the marketplace for a new standard, with interoperability problems being resolved afterwards.

What Does This Tell Us?

HTML, CSS, XML, RSS and JavaScript are all standards which professional Web developers would probably be expected to have expertise in. But the standards themselves have been developed in different ways, with HTML’s developments. But despite the importance of these standards it would appear that there aren’t any clearly identifiable criteria which can be used to establish the reasons for the successes.   And the successes within W3C for HTML, CSS and XML have not been repeated for other W3C standards such as SMIL and SVG. So I have to admit defeat in my attempt to clearly  identify success criteria for Web standards based on a small number of examples – and I haven’t attempted to address the perhaps more contentious issues of the criteria for failed standards.  Can anyone help – or are we condemned to wait for the marketplace to eventually let us know what next year’s failures will be?

Posted in standards, W3C | 3 Comments »

Will The SVG Standard Come Back to Life?

Posted by Brian Kelly on 11 January 2010

In November 2008 I asked “Why Did SMIL and SVG Fail?”  The post suggested reasons why the W3C’s Scaleable Vector Graphics standard (which became a W3C recommendation in 2003) had failed to be widely deployed in the market place.

In the comments to my post a number of people pointed at the lack of support for SVG in Microsoft’s Internet Explorer as a significant factor in SVG’s failure to be adopted.

Despite the economic gloom the new year has seen some good news with the announcement by Patrick Dengler, Senior Program Manager of the Internet Explorer Team that “Microsoft Joins W3C SVG Working Group“.  And as described in an article on “Microsoft joins IE SVG standards party” published in The Register: “Commentors responding to Dengler’s post overwhelmingly welcomed Microsoft’s move, with people hoping it’ll lead to SVG support in IE 9“.

So what are the lessons regarding a standard released in 2003  for which it takes 7 years before a company which appears to be essential for its successful deployment shows interest. And even if IE 9 does have support for the standard how long will it be before the user community discards the legacy browsers such as IE 6, 7 and 8.  Let’s not forget that there is still significant usage of IE 6.

The lesson: we tend to be too over-optimistic of the benefits of open standards and their take-up.

The response: we need to take a risk assessment and risk management approach to standards.

Posted in standards, W3C | 5 Comments »