UK Web Focus

Innovation and best practices for the Web

Link Checking For Old Web Sites

Posted by Brian Kelly on 4 January 2011

Web sites rot. Over time they’ll start to break. Not only will increasing numbers of links to external resources start to break but you may also find that the functionality provided within the Web site may start to break. This may be a problem if Web sites are still being used but are no longer maintained. But what should be done?

From 1999-2000 UKOLN was a member of the EU-funded EXPLOIT project and provided the Exploit Interactive Web magazine. This was followed, from, 2000-2003 by the Cultivate Interactive Web magazine. Since the funding ceased a link check of the Web sites has been carried out annually with the findings published and summaries of any problems documented. Only internal links are checked and the surveys helped us to identify and fix a number of problems which occurred when the Web site was migrated from a Windows NT service to an Apache server running on a Unix box. We have also observed a small number of broken links to third party Web site usage services, as illustrated below.

Running the annual link check and documenting the findings takes about 10 minutes. The Exploit Interactive and Cultivate Interactive Web sites are technically quite simple, with little integration with third party services. However as Web sites increasingly make use of content and services provided by third parties there are dangers that such dependencies will cause problems. So perhaps auditing of such services, including project Web sites which are no longer being funded, will become increasingly important. The Exploit Interactive

Alternatively you could argue that after a period of time such Web sites should be deleted. We recommended to the EU that project Web sites should be expected to continue to be hosted for at least three years after the funding had expired. We also suggested that this should be a minimum and that organisations should try to continue to host such Web sites for ten years after the funding has finished. Since the final issue of the Exploit Interactive ejournal was published in October 2000 we have achieved that goal. Should we now delete the Web site? Doing so might save ten minutes a year in checking that the Web site is still functioning, but would mean that articles on a number of EU-funded projects would be lost, including the following which were published in the final issue:

  • ELVIL 2000: Ingrid Cartwell and Magnus Enzell introduce the prototype for the ELVIL 2000 Project, an Academic Portal for European Law and Politics.
  • EQUINOX: Following on from an earlier article in Exploit Interactive, Monica Brinkley provides an update on the EQUINOX project, a Library Performance Measurement and Quality Management System.
  • ILSES: Meinhard Moschner and Repke de Vries describe the development of a specialised networked digital library which integrates publication retrieval and survey data extraction.
  • LIBECON 2000: David Fuegi, John Sumsion and Phillip Ramsdale discuss the LIBECON2000 Project and its Millennium Report.
  • TECUP: Paul Greenwood and Martina Lange-Rein on TECUP, a meta project which analyses practical mechanisms for rights acquisition for the distribution, archiving and use of electronic products.
  • VERITY: Alexandra Papazoglou gives a final report on Project Verity: Virtual and Electronic Resources for Information skills Training for Young people.

I can’t help but feel that the Web site should continue to be hosted. But what should the general policy be for project Web sites? What are others doing for project Web sites whose funding may have ceased ten years ago or five years ago or even more recently?

Note: Coincidentally after published this post I received an email containing details of the uptime for the Exploit Interactive and Cultivate Interactive Web sites. I receive an automated email if the Web sites are not available and also receive weekly reports on the server availability, as illustrated below. Another approach to consider for legacy Web sites?

About these ads

13 Responses to “Link Checking For Old Web Sites”

  1. Re: Should we now delete the Web site? Doing so might save ten minutes a year in checking that the Web site is still functioning, but would mean that articles on a number of EU-funded projects would be lost

    They would still be available at http://web.archive.org/web/*/http://www.exploit-lib.org/ so not lost entirely.

    One option might be to replace the whole website with a single link to the Wayback Machine?

    On the other hand, I tend to agree with you that continuing to host the content is pretty low cost and so might as well be done.

    • Thanks for the suggestion. A limitation with relying on an archived copy on the Wayback machine is that you can no longer update the content. I appreciate that this could be a benefit, but that is a policy decision. It way be felt desirable to be able to remove interactive elements (e.g. content or services provided by third parties) is there are problems with such services.

  2. Les Carr said

    You might be interested in the ESRC ReStore project, whose aim is to take the web sites of funded projects and to preserve them for ongoing use (post project funding period) by working with the project team during the lifetime of the project. It’s just starting its second phase of funding, and one of the key aims is to make people realise that their project lifespan simply represents the gestation of their website and to plan accordingly (re technology and rights). It’s a kind of preservation project that is driven up front by the project funders and project workers.

  3. Hi Brian,

    bringing together two conversations here ….

    I’ve added the list of sites we tweeted at:
    http://www.dpconline.org/newsroom/latest-news/668-web-archiving-and-the-ghosts-of-christmas-past- It was intended as a bit of fun while the office was closed and it will be fun (but not scientific) to see if we can still use them next year. Each one is structurally slightly different – a video, a self extracting zip file and so forth so in addition to reminiscing over Christmases past we can track how different technologies degrade. That will make an interesting blog post / Ariadne article twelve months from now….

    I’m not aware of any specific guidance on whether and/or how to wind down project websites in the way you mention but there’s certainly room for a conversation and advice note on this. In fact that was partly behind the advice mote which I wrote for MLA and which Marieke published in UKOLN. The theme of that was the rather more alarming topic of what to do when your IP goes bust, but a managed response to long term degradation or to a short term crises have some elements in common.

    Is this something we should take up at the Web Archiving and Preservation Task Force? Would be good to get a small group of wise heads together.

    Four things to consider …

    1. You can, of course, already nominate your site for the UK Web Archive and other similar services and I’m amazed that more people don’t do this already. ‘Regulation based harvesting’ that comes from changes in the Legal Deposit regime will change elements of your question though others are better placed to say how that will develop. I doubt it will eliminate the problem.

    2. Trends in web archiving these days seem to be about providing much better signposting to holdings. There’s room for interesting work on 404 errors. Specifically if web archives could intercept 404 errors and redirect traffic to archives holdings then that would benefit all parties. (Memento and Web Continuity for example …)

    3. It’s a slightly different topic, but isn’t there also an issue for sites that are updated instead of just frozen. I like the idea of being able to look at previous instances of a page (such as in a wiki) but I find the tools difficult to use. Again I like the way memento overcomes this weakness.

    4. At a more conceptual level there’s room for institutions and projects to take a long term view early. That’s not to say that they need to keep everything, it means they need to decide early on the things that they think are valuable and for how long. If we can identify these in a planning stage then we are more likely to spend our time wisely. Imagine if a site could simply mothball (or delete) itself page by page as various deadlines expired. That would be a lot more elegant than simply failing to function because a widget broke.

    Good questions….

  4. I started downloading (and then re-posting via google docs) pdf versions of a defunct online literacy journal this summer, worrying about it’s future. Alas, I was too casual, and the domain expired before I could finish. *Poof*

    The journal’s host and staff were long gone – laid off when the funding ended – so there’s no one to talk to about this. The files themselves were stored on a university server as part of a limited partnership, but the university has no on-going relationship or reason to re-post them. I have a hard copy of each of the 10 issues. But I can’t hotlink to printed paper….

    This is recent history: the journal published twice yearly for five years, starting in 2003. But, it wasn’t recent enough, maybe, to take advantage of all the “cloud” tools. If it started again, I would immediately upload copies to a couple of free storage spaces (windows skydrive and google docs, probably) as back ups. I’d also be more likely to save copies on an external HD here at home.

    It’s as if we need internet libraries to save and store and curate this stuff.

    W.

    • It’s as if we need internet libraries to save and store and curate this stuff“: Very true, but, sadly, in the UK funding for libraries is being cut. I feel that there is a need for Web site owners, funders and other stakeholders to have a greater understanding of the importance of Web site preservation and take responsibility for development and implementation of preservation strategies.

  5. Maureen said

    Another option could be assess the website against your preservation priorities – i.e. what is it that you’re trying to preserve? Is it mainly the magazines/reports? In which case, do you need to continue maining the site or are there alternative options, such as storing the files in a repository operated and maintained by the funding body? This would return control over the end products to the original funder and would minimise the number of different locations that have to be maintained/checked. I appreciate that’s the simple view and there would be other issues to work through, but it’s worth considering the the options.

    • That’s a useful comment. I was thinking about the preservation of project deliverables, which in this case is a Web journal. However, as you suggest, there may be a need to consider the different types of deliverables projects produce and implement preservation plans based on the specific requirements of different types of deliverables.

  6. [...] Link Checking For Old Web Sites – Brian Kelly [...]

  7. [...] that the domain is registered for an appropriate period of time. As described in a post entitled Link Checking For Old Web Sites it may be useful to set up an automated alert so that you receive notification if the Web site [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: