UK Web Focus (Brian Kelly)

Innovation and best practices for the Web

Attention – Services Unavailable!

Posted by Brian Kelly on 16 Mar 2009

Background

Bath University Computing Services (BUCS) is planning engineering work from 4:30 pm on Friday 27 March until 9:00 am on Monday 30th March 2009. This means that no UKOLN Web sites or services will be available for that period. Further information is available on the BUCS Web site.

Dissemination

As a variety of UKOLN services will be unavailable over the period (which is the weekend after next) we will need to ensure that our key stakeholders are informed (including our funders, JISC and MLA) and take steps to ensure that we alert anyone who may be making use of such services over this period – and possibly afterwards, if any unexpected problems are encountered.

Before alerting the key stakeholders we needed to identify affected services. As well as the obvious Web sites on a .ukoln.ac.uk domain there are also the Web sites, such as Exploit Interactive (http://www.exploit-lib.org/) and Cultivate Interactive (http://www.cultivate-int.org/)  which, although they are hosted locally, do not have an obvious dependency on UKOLN servers.

There was also a need to identify other network services besides Web sites. Being unable to send email messages or receive incoming email may be obvious, but do we have any services which rely on automated processing of emails (such as various Listserv mailing lists we host?)  Similarly what about other networked services besides Web and email – what about any LDAP services, streaming video services, Z39.50 services, etc. , etc.? And what about the services outside of Bath which may make use of our services? Will they degrade gracefully if our servers are unavailable over the weekend or mwill such services (which are not only external to us, but we may not even know they exist) fail or timeout as they await a response from our servers?

Having (we hope!) identified the key services we need to disseminate the news of the unavailability of our services and the possible implications for other service providers who have dependencies on our services and the end user communities we need to make use of the various dissemination challenges in order to alert the various affected communities.

Clearly email has an important role to play in communication with the key stakeholders.  And we have provided an alert on UKOLN’s news service, which is also available via RSS. These are the obvious dissemination channels, but what else can we use?

In this blog post about the associated issues (which I’ll expand on in the following section) I’m also alerting readers of this blog (who may also be users of UKOLN – and Bath – services) of the scheduled downtime. And I will also use Twitter to send out an announcement about this post which will be followed by another tweet shortly before the services are brought down.

I’ve also updated the RSS feed for the QA Focus Web site and will do something similar shortly for the Exploit Interactive and Cultivate Interactive news feeds.

General Issues

Twitter posts about ArchivesHub downtimeFor this scheduled downtime we have had time to discuss the implications and make plans for informing our users. And we’ve had useful discussions with other affected parties in the University, including the e-learning unit.  But what about the wider issues such as whether a weekend of service down-time should be regarded as acceptable, whether we should provide mechanisms for prov9ding backup services which aren’t dependent on the local network or even looking to migrate our services to external providers?

We, of course, aren’t alone in having to consider such issues. Last week there were a number of Twitter posts about service problems with a number of MIMAS services including COPAC and the Archives Hub. And although  a MIMAS news item was published when the service was restored I felt that the various tweets which were published when the services first became at risk demonstrated how Twitted can be useful in immediate feedback and also a mechanism for feedback.

Back in January 2008 I wrote a post entitled When Web Sites Go Down which was concerned with the announcement by the University of Southampton that its Web site was down for scheduled maintenance from 2-4th January 2008. In light of the service unavailable of well-established services hosted by prestigious institutions such as the universities of Bath, Manchester and Southampton it might be timely to ask ourselves whether educational institutions need still to be involved in the hosting of widely used services? Wouldn’t it be better, we may ask, to leave hosting to the global organisations such as Google and Yahoo? But if that’s your view, reflect on a recent email sent out by Yahoo to users of the Yahoo Mail service:

From: “Yahoo! Mail” <noreply@email.yahoo-inc.com>
Date: 10 March 2009 23:50:43 GMT
To: Subject: Scheduled Maintenance
We are undertaking some essential, but extensive, maintenance to improve Yahoo! Mail this weekend. The maintenance is part of our ongoing efforts to give you the best Mail service we can.

Beginning the evening of Friday March 13th (PDT) you may experience problems accessing your Yahoo! Mail account. If your account is affected, it should be available again by midday on Saturday March 14th (PDT).

We sincerely apologize for this inconvenience.

Best regards,

The Yahoo! Mail team

I think we do need to keep asking such questions. But we also need to remember that the grass isn’t always greener on the other side of the fence. And I hope the email send by Yahoo’s support team on the 10 March about the downtime on 13-14 March wasn’t the only notification which Yahoo Mail users received!

But as well as asking ourselves the longer term question about how our services should be hosted, we still need to address the issues of service downtime (whether scheduled or not) and how we alert our users and other service providers who may be affected. Any thoughts?

Advertisements

7 Responses to “Attention – Services Unavailable!”

  1. Very timely as we are just recovering from a major unplanned outage. In our case we lost email and managed desktop services so communications were at a premium. We use a couple of methods in such circumstances at present (i)SMS and (ii) a status site hosted at JANET (http://www.ucl-status.ac.uk/ – so far enough away!). These mechanisms are built into our Major Incident / Disaster Recovery / Business Continuity plans.

    We need to do more – have the DNS of the status site easily re-written in an emergency so it takes over the main address for example – and I think we should be using Twitter as well as SMS (it’s free and you don’t have to maintain a database of phone numbers). I know a number of institutions already do this.

  2. Emma said

    It’s certainly something that Universities (and, as you say, other service providers) need to consider – especially in the case that Jeremy mentions above, unexpected outage.

  3. […] Clearly, either is much better than having all services down unexpectedly, as can happen. (See Jeremy’s comment on Brian’s post). They had a couple of ways of communicating with users (not sure if it was […]

  4. Sue Cunningham said

    Sheffield are also using Twitter – see Robert Needhams post on the Communications, Information and Liaison blog.

  5. Dave Cunningham said

    Note that the downtime at Bath which affects UKOLN is due to essential work on the local electrical substation. Normally we would use our standby generator to keep the service up but in this case, and because of the nature of the work, it is not possible to do so for safety reasons.

    By this time next year we will have a second machine room, and it is unlikely that we will ever have to close the whole service down again after this month. UKOLN should probably talk to us about splitting their services between the two machine rooms from next year.

  6. […] do think there is a parallel with institutions beginning to question their role in other areas, as Brian Kelly discusses – should institutions begin to think of themselves more in the role of a broker of access and […]

  7. At GU IT Services we have a ‘Spotlight’ on our Helpdesk landing page. http://www.gla.ac.uk/services/it/helpdesk/ which we encourage the various communities to visit if they have an IT problem. We use this to publish planned and unplanned :-) incidents – There is an RSS feed and you can add the alerts and url to various 3rd party services. We also send out an initial email to alert subscribers an incident is taking place. There is various redundancy in the back end to make sure we have a least one way to get a message out onto the webpage.
    We update the ‘unplanned’ incidents in real time, often proactively looking updates from the affected service teams.
    This of course is only good if the Web and email servers are up… In the event of email or both being unavailable we use SMS.
    We also have several layers of staff who take over in the event of the Information Officer (me) being unavailable. It works pretty well and the different user communities like it.
    We couldn’t resist calling this internally ‘Front Line Alerts Procedures’- FLAPs – although it took a while to get the words to fit the acronym…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

 
%d bloggers like this: