UK Web Focus (Brian Kelly)

Innovation and best practices for the Web

When Web Sites Go Down

Posted by Brian Kelly on 4 Jan 2008

A colleague of mine has just alerted me to the fact that the University of Southampton Web site is down for scheduled maintenance from 2-4th January 2008. She had noticed this as she regularly visits the Web site to access the wide range of resources it provides on institutional repositories (note added on 4 Jan 2008 – the Web site is now available, ahead of schedule!).

University of Southampton Web site downtime announcementThat’s no big deal, you may think, servers do need maintenance and the first few days after the Christmas break is probably the best time,with students still away and many researchers likely to take an additional few days holiday.

I’d be in broad agreement with such sentiments (I used to work in IT Services, after all, and I’m aware of the complexities of managing IT systems). But have our expectations changed, I wonder? And rather than taking time off at this time of year, what if users have imminent deadline for papers and need to access such services? And who are the users of the University of Southampton Web site – no longer just staff and students at Southampton, I would argue’ rather at prestigious institutions such as the University of Southampton there is likely to be a significant national (and indeed international) user community.

But how should we establish what reasonable practices may be in addressing user expectations of a 24×7 service availability, but without the business models to fund such requirements. Perhaps the debate can be helped by initially monitoring best practices within the community and making comparisons with other communities.

In this respect the Netcraft service can be useful, as it provides automated analyses on public Web services, including profiles on Web server software usage and server uptime data.

As can be seen from the graph, the main Web server at Southampton University has had an average uptime (based on a 90-day moving average) of 405 days. And this data compare very favourably with Sun’s data for which the equivalent figure is 34 days.

Netcraft server uptime graph for Southampton University

I suspect the University of Southampton will have a high rating with the UK HE sector for its server uptime. But, of course, that will probably not be appreciated by the user who tries to access the site on day 406 to gather data for a paper which needs to be submitted by day 407!

Is it possible (or, rather, realistic) to improve the server availability for institutional services? Should we be replicating our servers (or our data)? Should we outsource the management of our services to companies such as Amazon, as an international company such as Amazon (with their data hosting S3 service) may be better positioned to provide 24x7x365 availability?

But before responding to such questions I feel that institutions may need to ask themselves to whom they should be accountable. If institutional Web sites are now providing significant services to a global audience, how can we ensure that that global community is being provided with acceptable levels of service? After all, we ask these questions of externally-hosted Web services. But don’t we all act as externally hosted Web services to others outside our institution?

Wouldn’t it be interesting to have server uptime data across all our institutions? And if the data for sector compares favourably with the commercial sector, then we will have something to be pleased with. And if the comparison is unfavourable, then this should help to inform our planning – and provide objective data to inform discussions on the relevance to our sector on services such as Amazon S3.

3 Responses to “When Web Sites Go Down”

  1. Mark Sammons said

    There is already that sort of information being compiled. This sort of reporting is consistent with the area of professional Service Management and Deliver that is growing fast across the Higher Education sector, mainly with organisations implementing ITIL practises.

    Actually Southampton does produce these figures as it has a pretty mature ITIL implementation, as can be seen here: http://www.soton.ac.uk/iss/essentials/about/performance/availibilitytargets.html

    These levels are set with Service Level Agreements (SLAs) with the customers (in the web site, that would be mainly the people who are putting content up there). They then have statistics to judge that against. I think at the moment, as it is early days for this sort of thing and people don’t want to necessary make a rod for their own back, they are using pretty conservative levels (for example, 75% at the weekends for the website is ludicrously conservative!), but then people can use these levels to judge it against other services. For example, you mention S3, which has its own SLA (http://www.amazon.com/gp/browse.html?node=379654011). It says 99.9% but there are many clauses (99.9% doesn’t include when there are internet issues beyond its site). Also the Netcraft statistics are very poor to use as comparison because they use uptime since last reboot – the site may have only been down for a minute, which means average uptime is in “4 nines” territory (~99.99%) or it may have been down for days and so average uptime could be 99% or even lower.

  2. Hi Mark – many thanks for this info. I was involved in developing SLAs when I worked for the IT Service department at Leeds University many years ago, but haven’t followed developments for some time.

    As you say, Southampton University is publishing its data, and the figures for its Web site availability look pretty good (my colleague should regard herself as fortunate to have a rare sighting of the server unavailable message!

    I mentioned previously the lively debate between Niall Slater and Tony Hirst at the Open University over Niall’s observation that Slideshare was down and Tony’s immediate follow-up in which he pointed out that the Open University’s authentication server was also down.

    That resulted in a lively (but friendly) debate. But I think it would be useful to have some sector statistics to inform such discussions (although I take your point about the dangers of going down this route). I also appreciate the limitations of Netcraft’s data.

    Now would it be possible to scrape the data published at Southampton and other institutions and aggregate the data? Southampton’s data was produced from MS Excel in dodgy MS HTML format. Do ITIL recommend XML schemas for the publication of such data in reusable formats?

  3. Mark Sammons said

    ITIL is concerned with best practise, as opposed to defining procedures, and so doesn’t perscribe anything as specific as XMl schemas!

    I should also point out that one very interesting aspect of ITIL (and you have to appreciate its a large topic area and what I have mentioned is skating the surface of it), is that of Finance. ITIL talks about costing being a lot more transparent, so that people know where their money is going with regards IT budgets. This leads to service-based costing, where the cost of services is fully transparent to the customers. They can then use this to compare and contrast with something like S3. It also solves the problem of “managing expectations”, where people realise that if they want something better, it doesn’t come for free. You said above, “Is it possible (or, rather, realistic) to improve the server availability for institutional services?”. Of course it is, but are the customers willing to pay for the infrastructure to be put in to make it better? There’s a pretty good article on service based costing here: http://www.ndma.com/resources/ndm26486.htm

    Unfortunately, because of the way that Central IT has been financed by Universities (budget for IT is often “top-sliced” off department budgets before they even get them), I doubt too many Universities will implement this to any great extent. Disappointingly.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

 
%d bloggers like this: