UK Web Focus (Brian Kelly)

Innovation and best practices for the Web

Approaches To Archiving Professional Blogs Hosted In The Cloud

Posted by Brian Kelly on 17 Sep 2010

I was recently thinking about the “must read” blogs which are always the first I read in my blog reader.  These include:

OUseful: Tony Hirst’s blog “in part about… things that I think may be useful in an higher education context, one day…“.

eFoundations: A blog about “Web 2.0, the Semantic Web, open access, digital libraries, metadata, learning, research, government, online identity, access management, virtual worlds and anything else that takes our fancy by Pete Johnston and Andy Powell“.

The Ed Techie:  Martin Weller’s blog on “Educational Technology, web 2.0, VLEs, open content, e-learning, plus some personal stuff thrown in“.

Learning with ‘e’s: Steve Wheeler’s “thoughts about learning technology and all things digital“.

Ramblings of a Remote Worker: My colleague Marieke Guy’s reflections on working from  home and the broader issues of remote working.

What do these blogs have in common? From a personal perspective they are all written by people I like and respect – and have been out for a drink with.  But in addition the blogs are all hosted outside the blog authors’ institution, at (via the,,, and

Isn’t it risky that such valuable professional blogs are hosted outside the institution? Shouldn’t we be learning the lessons of the imminent demise of the Vox blogging platform and look to migrate such blogs to a trusted institutional environment? After all although the early adopters of blogs may have had to use an externally-provided platform we are now finding that institutions will be hosting blog platforms, in many cases the open source WordPress application.

I don’t think such blogs should move to the host institution. I feel that use of platforms such as, and can provide flexibility and autonomy which may be lost if an institutional platform were used. And, as described in a post on “Auricle: The Case Of The Disappearing E-learning Blog” there is no guarantee that  a blog hosted within the institution will necessarily be sustainable.

But if third party blogging platforms are used to support professional activities there will be a need to assess and manage possible risks of loss of the service. In the case of well-established services such as WordPress, Typepad and Blogger it is unlikely that such services will disappear overnight. If, as is the case with Vox, the service is not sustainable we could reasonably expect to be provided with notification on withdrawal of the service.

But perhaps a bigger risk relates to the responsibilities associated with ownership of the blog by individual authors as opposed to the departmental responsibility which would be the case of the institutional blog environment. What, for example, could happen to the contents of a blog if the author left his or her host institution?

In some cases it might be argued that the blog contents are owned by the individual and the host institution would have no claim on the content. But this won’t be true in many cases including, for example, blogs used to support JISC-funded projects. And at a time when the public sector spending is becoming subject to public scrutiny how would we explain to tax-payers that a University employee can own valuable content and is free to delete it if, for example, they were made redundant?

My colleague Marieke Guy and myself have written a paper on Approaches To Archiving Professional Blogs Hosted In The Cloud” which has been accepted by the iPres 2010 conference which takes place in Vienna next week.

The paper is based on UKOLN’s digital preservation work including the JISC PoWR project. Recently we have explored ways for preserving blog content, ranging from migration of rich XML content, processing a blog’s RSS feed, mirroring a blog’s Web site, creating a PDF version of a blog through to creating a paper copy of a blog! In addition to the technical approaches the paper also addresses the associated policy issues. On this blog and Marieke’s blog we have provided a policy statement which states that:

  • A copy of the contents of the blog will be made available to UKOLN (my host organisation) if I leave UKOLN. Note that this may not include the full content if there are complications concerning their party content (e.g. guest blog posts, embedded objects, etc.), technical difficulties in exporting data, etc.)
  • Since the blog reflects personal views I reserve the rights to continue providing the blog if I leave UKOLN. If this happens I will remove any UKOLN branding from the blog.

We have applied the guidelines we have developed to a number of other UKOLN blogs which are hosted externally including the IWMW 2009 event blog and the JISC SUETr project blog.

Marieke will be presenting this paper at the iPres 2010 conference next week. Her slides are available on Slideshare and are embedded below.

We’d welcome comments on the approaches we have developed?  Do they satisfy the concerns of the institution related to possible loss of valuable content whilst providing professional bloggers with the flexibility they may feel they need?

7 Responses to “Approaches To Archiving Professional Blogs Hosted In The Cloud”

  1. We have a plugin for EPrints for progressively capturing a hashtag from twitter and archiving it as part of the institutional archive. I know twapper keeper does this, but the point is that it goes into the core collection.

    The next logical step for this is being able to capture blogs too, and then perhaps a way-back-machine feature too to capture page which change over time and represent something important to the host org.

    We run our own blogs server, but many staff use external ones. They would want to keep using the same blog even if changing university. However, it would make sense to capture feeds which represent our academic output. Right now we don’t even index it…

  2. Tony Hirst said

    I started paying the extras fee to allow me to use my own domain name to point to, the blog, so that I could, if required, physically move the blog elsewhere yet to all intents and purposes retain the url. Syndication is handled by feedburner, but again, I recently tweaked the settings for this so that I could publish an address that is under my control: (though I guess is also valid)

    So whilst I use to host, and feedburner feeding from to syndicate, the namespaces I publish/link to to and now try to get people to subscribe to are under my control: and (or ).


  3. Tony Hirst said

    Ah, PS I blogged about the domain name change here:

  4. Chris Rusbridge said

    I was surprised you didn’t mention the ULCC ArchivePress project, specifically designed for archiving blogs…

  5. Hi Chris
    We initially discussed submitting a joint paper with the ULCC team, but they didn’t have the effort available to contribute. An early version of the paper did mention the ArchivePress work and we had hoped to include details of use of this service for archiving UKOLN blogs. However ArchivePress has not yet processed our blogs and, as described in a post on Latest progress and plans on the ArchivePress project blog “it’s been another quiet time on the ArchivePress project“. Since the paper was summarising the various tools we had used we felt it inappropriate to mention the ArchivePress work in this paper.

  6. […] Approaches To Archiving Professional Blogs Hosted In The Cloud […]

  7. […] Approaches To Archiving Professional Blogs Hosted In The Cloud […]

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: