Archiving Blogs and Machine Readable Licence Conditions
Posted by Brian Kelly (UK Web Focus) on 21 April 2011
Clarifying Licence Conditions When Archiving Blogs
As part of the closure process for our blog we have provided a Status of the Blog page which summarises the reasons for the closure, provides a history of the blog, outlines various statistics about the blog and provides some reflections of the effectiveness of the blog.
Another important aspect of the closure of a blog should be the clarification of the rights of the blog posts. This could be important if the blog contents were to be reused by others – which could, for example, include archiving by other agencies.
As shown a human readable summary was included in the sidebar of the blog which states that the content of the blog are provided under a Creative Commons Creative Commons Attribution-Noncommercial-Share Alike 2.0 UK: England & Wales License.
The sidebar also defined the scope of this licence which covered the textual content of blog posts and comments which were submitted to the blog. It was pointed out that other embedded objects, such as images, video clips, slideshows, etc, may have other licence conditions.
However automated tools will not be able to understand the licence conditions. What is needed is a definition of the licence in a format suitable for automated reading. This has been implemented using a simple use of RDFa which is included in the sidebar description. The HTML fragment used is shown below:
<img alt=”Creative Commons License” src=”http://i.creativecommons.org/l/by-nc-sa/2.0/uk/88×31.png” /> This blog is licensed under a <a href=”http://creativecommons.org/licenses/by-nc-sa/2.0/uk/” rel=”license”>Creative Commons Attribution-Noncommercial-Share Alike 2.0 UK: England & Wales License</a>.
How might software process such information? One example is the OpenAttribute plugin which is available for the FireFox, Chrome and Opera browsers. This is described as a “suite of tools that makes it ridiculously simple for anyone to copy and paste the correct attribution for any CC licensed work“. Use of the OpenAttribute plugin on the Cultural Heritage blog is illustrated below.
Assigning Multiple Licences To Embedded Objects in Blogs
The image above shows the licence for the blog in its entirety. However the blog is a complex container of a variety of objects (blog posts from multiple authors; comments from readers and embedded images and other objects from multiple sources) and each of these embedded may have its own set of licence conditions.
How might one specify the licence conditions of such embedded objects? In the case of the Cultural Heritage blog there was a statement that any comments added to the blog would be published under a Creative Commons licence so although anybody making a comment did not have to formally accept this licence condition, it practice we can demonstrate that we took reasonable measures to ensure that the licence conditions were made clear.
In order to specify the licence conditions for embedded images we initially looked at the Image Licenser WordPress plugin. However this provides a mechanism for assigning licence conditions as images are embedded within a post, which are then made available as RDFa. Since in our case we were looking at retrospectively assigning licence conditions to existing images (in total 151 items) it was not realistic to use this tool.
The Creative Commons Media Tagger provides the ability to “tag media in the media library as having a Creative Commons (CC) license“. But what licence should be assigned to images on the blog? These include screen images and photographs which may have been include by guest bloggers but which have not been explicitly assigned a Creative Commons licence. The question of Who owns the copyright to a screen grab of a website? was asked recently on ecademy.com with a lack of consensus and a patent and trade mark attorney providing the less than helpful suggestion that “It is better to include a link to the original work if it is on the Web rather than to copy it.“ The uncertainties regarding ownership of screen shots are echoed in a Wikipedia article which states:
Some companies believe the use of screenshots is an infringement of copyright on their program, as it is a derivative work of the widgets and other art created for the software. Regardless of copyright, screenshots may still be legally used under the principle of fair use in the U.S. or fair dealing and similar laws in other countries.
In light of such confusions there is a question as to what licence, if any, should be assigned to images in the blog. As described in the Creative Commons Media Tagger FAQ it is possible to run the plugin in batch mode to “tag media that was already in your media library prior to installing and activating CC-Tagger“. It occurred to me that it would be best to assign a non-CC licence by default to all images and then to manually assign an appropriate CC licence to images such as those taken from Flickr Commons in a post entitled “Around the World in 80 Gigabytes“. However using the batch made of the tool appeared not to change the content – and it is unclear to me whether there is a way of providing a machine-readable statement in RDFa stating that a resource is not available with a Creative Commons licence.
Using the Image Licenser tool on an individual image resulted in the following HTML fragment which illustrates how a machine readable statement of the licence conditions can be applied to an individual object:
<img class=”size-medium wp-image-2206″ title=”Flickr Commons” src=”http://blogs.ukoln.ac.uk/cultural-heritage/files/2011/02/flickr-commons-300×205.jpg” alt=”image of flickr commons home page” width=”300″ height=”205″ />
Whilst finalising this post I asked on Twitter “Is it possible to use RDFa to provide a machine-readable statement that an image *doesn’t* have a CC licence? …” and followed this by describing the context: “.. i.e. have a blog post with CC licence for content but want to clarify lience for embedded objects. #creativecommons“. Subsequent comments from @patlockley and @jottevanger helped to identify areas for further work which I hadn’t considered – I have kept an archive of the discussion to ensure that I don’t forget the points which were made. A summary of my thoughts is given below:
Purpose: Why should one be interested in ways in which the licence conditions of objects embedded in blog posts? My interest relates to arching policies and processes for blogs. For example if an archiving service chooses to archive only blogs for which an explicit licence is available there will be a need to ensure that such licences are provided in a machine-readable format in automate to allow for automated harvesting. There will also be a need to understand the scope of such licences. In addition to my interests, those involved in the provision of or reuse of OER resources will have similar interests for reusing blog posts if these are treated as OER resources. Finally, as @jottevanger pointed out this discussion is also relevant more widely, with Jeremy’s interests focussing on complex Web resources containing digitised museum objects.
Granularity: What level of granularity should be applied – or perhaps this might be better phrased as what level of granularity is it feasible to apply machine readable licence conditions for complex objects? Should this be at the collection level (the blog), the item level (the blog post) or for each component of the object (each individual embedded image)?
Risks: Should one take a risk averse approach, avoiding use of a Creative Commons licence at the collection level since it may be difficult to ensure that each individual item has an appropriate Creative Commons licence)? Or should one state that by default items in the collection are normally available under a Creative Commons licence, but there may be exceptions?
Viewing tools: What tools are available for processing machine understandable licence conditions? What are the requirements for such tools?
Creation tools : What tools are available for assigning machine understandable licence conditions? What level of granularity should they provide? What default values can be applied?
I know that in the OER community there are interests in these issues. I would be interested to hear how such issues are being addressed and details of tools which may already exist – especially tools which can be used with blogs.