UK Web Focus (Brian Kelly)

Innovation and best practices for the Web

Archive for September, 2011

“I Predict A Riot”: Thoughts on Collective Intelligence

Posted by Brian Kelly on 29 September 2011

Technology Outlook: UK Higher Education

The New Media Horizon’s “Technology Outlook: UK Higher Education” report, which was commissioned by UKOLN and CETIS,  explores the impact of emerging technologies on teaching, learning, research or information management in UK tertiary education over the next five years. As described in a recent post on What’s On The Technology Horizon? Implications for Librarians I’ll be summarising the technologies featured in the report which I feel will have particular relevance to those working in Libraries at the forthcoming Internet Librarian International (ILI 2011) conference.

The report highlights ‘Collective Intelligence‘ as one emerging technology which is predicted  to have an time-to-adoption horizon of 4-5 years. But what exactly is ‘collective intelligence’ and what impact might it have on those working in libraries?

Collective intelligence is defined in Wikipedia as “a shared or group intelligence that emerges from the collaboration and competition of many individuals and appears in consensus decision making in bacteria, animals, humans and computer networks“. The article uses the social bookmarking service as an example of collective intelligence :

Recent research using data from the social bookmarking website, has shown that collaborative tagging systems exhibit a form of complex systems (or self-organizing) dynamics. Although there is no central controlled vocabulary to constrain the actions of individual users, the distributions of tags that describe different resources has been shown to converge over time to a stable power law distributions. Once such stable distributions form, examining the correlations between different tags can be used to construct simple folksonomy graphs, which can be efficiently partitioned to obtained a form of community or shared vocabularies. Such vocabularies can be seen as a form of collective intelligence, emerging from the decentralised actions of a community of users.

Other examples of ways of the relevance of social media in providing collective intelligence might include:

Predicting flu epidemics by observing search terms in Google: Back in 2008 an article published in the Guardian entitled “Google predicts spread of flu using huge search data” described how “Google Flu Trends takes the general search tracking technology pioneered by Google Trends and applies it specifically to influenza. The firm’s engineers claim to have devised a way of analysing millions of individual searches related to the disease that in tests proved to correlate closely with the actual incidence of illness.“.  A Google Scholar search for “predicting flu epidemics using google

Predicting earthquakes using Twitter:  An article entitled “Twitter can predict earthquakes, typhoons and rainbows too..” described an  “academic paper introduced by Takeshi Sakaki, Makoto Okazaki and Yutaka Matsuo from the University of Tokyo [which] investigates the real-time interaction of events such as earthquakes in Twitter and proposes an algorithm to monitor tweets and to detect a target event“.

Predicting social unrest in the Middle East using social media: An article on “The Social Media Revolution” described how “the CIA has been criticized for not being ‘followers’ on Facebook and Twitter and therefore failing to capitalize on the information those sites could have provided in predicting the recent turmoil“.

“We Predict A Riot”

These examples illustrate how social media can be used for predictions.  But predictions usually aren’t provided in isolation: rather predictions are used to identify appropriate actions which may need to be taken.  In the first example we might example doctors to ensure that they stock up on medical supplies and, if a particularly severe flu epidemic is predicted, the NHS may decide to fund a marketing campaign aimed at sectors of the population most at risk.  The second example  could also result in government action such as as mobilising emergency forces which could help to save lives.  The third example, however, could result in less benign interventions.

The Kaiser Chiefs sang “I predict a riot” but as suggested in a blog post which hosted the accompanying carton it might now be the crowds which are now predicting upheavals, whether geo-physical or social.

This move to collective intelligence might seem to challenge notions of centralisation and authority and thus, returning to the talk I’ll be giving at the ILI 2011 conference, be challenging to the traditional roles of libraries.

But these examples also highlight both the potential benefits and risks associated with trends which may be predicted through large scale use of social media. As has been highlighted in recent posts about privacy concerns related for Facebook users such issues are very relevant for mainstream users of social media today. (And yesterday’s announcement about the new range of Amazon Kindle devices and the Amazon Silk browser have raised additional privacy concerns).

Facebook’s analysis of users’ attention data can be clearly financially beneficial to Facebook in providing targetted advertising (which may also be beneficial for the end user) and of concern to users when information thought to be private is made available to others in unexpected ways (which tends to be the current focus of user education of the risks of use of social media).  But rather than the obvious embarrassing photos which people may be worried about, might it be the less obvious activities which may have the more significant impact in the future?

If I update my status saying I’ll be celebrating with a few pints of Deuchars IPA if England beat Scotland in the Rugby World Cup game on Saturday (while I am in Glasgow) this might be used to suggest that myself and others in my demographic like real ale and use this for targetting adverts (which might help me discover a Scottish real ale which I am unfamiliar with).

If I update my status saying I’m getting a sore throat this might help in providing signals of the flu (and could be more significant in terms of instigating change than my wasted vote in a General Election in Bath).

And if I update my status if I notice possibly illegal activities taking place, am I being helpful to society or could my status update be used by the authorities to justify unnecessary actions?  And could a provocative status update (which might be part of a large number of updates which cause people to riot) be therefore treated as incitement?  Has the future described in Minority Report (which addresses the theme of  “the role of preventative government in protecting its citizenry“) arrived?

Lots of questions, I know.  But I also feel that information professionals should have an important role in engaging with the debate. I should also add, as suggested in the post on “The Facebook Chart That Freaks Google Out” and the accompanying chart which is illustrated above, Facebook’s popularity does mean that it is a significant harvester of activity data, since people spend their time on the service and will often have provided their profile information.   But if Facebook users migrated overnight to, say, Diaspora would that mean that the benefits of analysis of activity data and content updates could be lost, including the positive benefits?  Or might it mean that although users will own their own data, they, understandably, won’t be aware of the possible misuses which could be made of their content updates?

There is a need to address the concerns raised by Facebook’s dominance and their cavalier approaches to privacy – but there’s also a need to look at the wider issues and not assume that any service which provides an alternative to Facebook will necessarily provide benefits across all areas.

Posted in Web2.0 | 3 Comments »

Is It Time To Ditch Facebook, When There’s Half a Million Fans Across Russell Group Universities?

Posted by Brian Kelly on 26 September 2011

Implication of Changes To Facebook

The changes to Facebook announced at Facebook’s F8 Developers conference last week haven’t gone down well in some circles with a number of the people I follow on Twitter expressing their concerns at the privacy implications of recent changes and one or two having gone as far as to delete their Facebook accounts.

Might those technically-savvy people be setting a trend which will become more widespread as the privacy concerns become more widely known beyond those who read blog posts which describe in detail how Facebook can monitor your interactions, even when you are logged out of the service? Or are these people in a minority and will we see that once the changes have been fully deployed and problems fixed in light of user feedback could be see an increase in Facebook usage?

Gathering Evidence of Institutional Use of Facebook

In order to be able to gather evidence of possible changes in usage patterns within the UK HE sector I have updated a survey of Use of Facebook by Russell Group Universities which was carried out in January 2011. A summary of the numbers of people who have ‘liked’ the pages, together with details of the changes from the previous survey are given in the following table.

Institution and Web site link
Facebook name and link
Nos. of Likes
(Jan 2011)
Nos. of Likes
(Sep 2011)
1 InstitutionUniversity of Birmingham
Fb nameunibirmingham
8,558 14,182 66%
2 InstitutionUniversity of Bristol
Fb nameUniversity-of-Bristol/108242009204639
2,186 7,913  262%
3 InstitutionUniversity of Cambridge
58,392 105,645 81%
4 InstitutionCardiff University
Fb namecardiffuni
20,035 25,945 29%
5 InstitutionUniversity of Edinburgh
Fb name: University of Edinburgh/108598582497363
(None found in first survey)
6 InstitutionUniversity of Glasgow
Fb Name: glasgowuniversity
(None found in first survey)
7 InstitutionImperial College
Fb nameimperialcollegelondon
5,490 10,257  87%
8 InstitutionKing’s College London
Fb nameKings-College-London/54237866946
2,047 3,587 75%
9 InstitutionUniversity of Leeds
Fb name: universityofleeds
(None found in first survey)
10 InstitutionUniversity of Liverpool
Fb nameUniversity-of-Liverpool/293602011521
2,811 3,742 33%
11 InstitutionLSE
Fb nameLSE/6127898346
22,798 32,290 42%
12 InstitutionUniversity of Manchester
Fb nameUniversity-Of-Manchester/365078871967
1,978 4,734 139%
13 InstitutionNewcastle University
Fb name: newcastleuniversity
14 InstitutionUniversity of Nottingham
Fb nameThe-University-of-Nottingham/130981200144
3,588 3,854 9,991 7% 178%
15 InstitutionUniversity of Oxford
137,395 293,010  113%
16 InstitutionQueen’s University Belfast
Fb nameQueens-University-Belfast/108518389172588
17 InstitutionUniversity of Sheffield
Fb nametheuniversityofsheffield
6,646 12,412  87%
18 InstitutionUniversity of Southampton
Fb nameunisouthampton
3,328 6,387  92%
19 InstitutionUniversity College London
Fb nameUCLOfficial
977 4,346 345%
20 InstitutionUniversity of Warwick
Fb namewarwickuniversity
8,535 12,112 42%
TOTAL 286,169


In brief in a period of nine months we have seen an increase in the number of ‘likes’ for the twenty UK Russell Group Universities of over 274,000 users or almost 100% with the largest increase, of over 155,000 occurring at the University of Oxford.


The previous survey highlighted emerging patterns of institutional use of Facebook and provided some suggestions on best practices (such as providing a Facebook page rather than a group and having a short and branded URL).  It seems that institutions are implementing such best practices more widely.  We are also seeing a huge increase in the number of Facebook ‘likes’ with apart from Nottingham’s 7% increase, all of the other institutions seeing a growth of between 33% and 345%.

But might this represent a peak for institutional use of Facebook?   Since we have over half a million users, many of whom will be staff or students at Russell Group Universities we might expect this particular demographic to have a better understanding of the dangers of misuse of Facebook than the general public.  It will be interesting to see how these figures change over the next academic year.

Beyond the Evidence of Usage – Is Facebook a Walled Garden?

This post has focussed on institutional use of Facebook to provide services to end users (a business-to-consumer relationship).  Of course there are privacy implications associated with use of Facebook and it might be argued that Universities shouldn’t be using unethical network providers – just as there were pressures on universities not to support businesses which had links with South Africa during the apartheid era.

I’ve not heard people seriously suggesting that Universities should stop their institutional use of Facebook, but there is a need to have a better understanding of the concerns people have regarding Facebook, in part so that we can ensure that possible alternatives to Facebook don’t repeat such concerns. The one particular areas of concerns I’d like to address in this post is that Facebook is a ‘walled garden.’

This morning I was involved in a brief Twitter discussion in which Twitter was dismissed as a ‘walled garden’. It was suggested that, just like AOL, you need to sign up to access content hosted on Facebook. Surely not? So I logged out of Facebook and visited the University of Warwick page and, as can be seen, I can view the page.

But rather than restrictions on accessing public information, perhaps Facebook is described as a walled garden because you can put information in, but not get it out again?

This was the case at one point, but know there is a Facebook Export service which “uses the Facebook Open Graph protocol to export your Facebook data to an xml file. Facebook Export does not store any data about you. You can then use this xml file to import your data to other services and websites that support the Facebook Export (FBE) format.

Or perhaps the concern is that use of Facebook apps locks information into a particular application? I feel there may be an element of truth to this concern – you can develop Facebook apps which do trap the data into the app.  But the Russell Group University Facebook pages seem to be using the default Facebook features, so this isn’t really a current concern. And even apps such as the Guardian Facebook app shouldn’t be regarded as acting as a walled garden since the same data can be accessed in several other ways, such as via RSS feeds, Android and iPhone apps and on the Web itself.

I, therefore, am unconvinced that current institutional use of Facebook can be regarded as using a Walled Garden and that Universities are promoting a propriety service.  Of much greater relevance will be how people react to the recent changes in Facebook. If people start to leave, there will be a need to reconsider Universities’ uses of Facebook as a marketing and engagement service.

Posted in Evidence, Facebook | 32 Comments »

We Can’t Ignore Facebook

Posted by Brian Kelly on 23 September 2011

An Example Of Facebook’s Success

During the summer I was involved in using Social Media to promote the Bath Folk Festival. Although I set up a Bathfolkfest Twitter account, I discovered that, apart from a small number of the performers, folkies don’t appear to make significant use of Twitter. The Bath Folk Facebook page, in ccontrast, was very popular, and currently has 124 ‘likes’ in contrast with the 27 people who are following the Twitter account. But how, specifically, widely used was it?

From viewing the Insight statistics for the page it seems that during the week of the festival there were no fewer than 10,854 views of the status updates with 166 people interacting with the page during the week.  As might be expected views of the page peaked during the festival, as is illustrated below.  But since those people are still connected with the page we will be able to reuse the connections which have been established for next year’s festival, as well as providing updates of folk events held in Bath throughout the year.

I haven’t posted about this previously, in part because my involvement with the folk festival was a personal interest. But in addition I suspect that many readers of this blog will regard Facebook as many Microsoft products: they both tend to be disliked for a variety of reasons but they are also very successful.

However in light of yesterday’s Facebook’s F8 conference I feel those involved in development activities, as well as those involved in mainstream marketing and student engagement activities, can’t afford to continue to disregard the potential relevance of Facebook.

Areas of Interest

Looking at the various articles and blog posts about yesterday’s news it seems that much of the focus focussed around links with Spotify, with the BBC News having the headline “Facebook focuses on media sharing and adds timeline“. However I would like to highlight two specific areas: the implications of the decisions by the Guardian to release a Guardian Facebook app and how, behind the scenes, Facebook seem to be endorsing use of RDFa and how this could help growth in use of Linked Data.

Guardian Facebook App

I was surprised when I saw yesterday’s announcement of the launch of a Facebook app for the Guardian newspaper. I currently have access to articles published in the Guardian provided as RSS feeds or via the Guardian app on my iPod Touch and Android phone. In addition I recently made use of the Kindle app on my Android phone to read the Guardian for about  a number before I decided that, although the experience was better than using the Guardian app or an RSS reader (to view articles not included in the view provided by the app). It was very interesting, therefore, to discover that the Guardian had chosen to invest resources to develop yet another app which allowed the content to be viewed within the Facebook environment.

I have installed the app. As can be  seen one can choose to view a variety of sections including the main Guardian section, Guardian Technology, Guardian Football and Guardian Data all of which I have ‘liked’.

In the accompanying image (of the Guardian data section) I have removed details of my Facebook friends who have also liked the page (and the NPR page). Clearly there are privacy issues in allowing one’s Facebook friends to not only see the games you may be playing but also the content you may be reading.

But in addition to being able to see the sections of the Guardian which one’s friends have liked I was surprised to spot in the app’s activity stream that using the app will disclose the sections you are reading.  As illustrated, a friend of mine has been reading an article on “Why we need a debate on the British way of death”. As I described in a recent post which asked “Is Smartr Getting Smarter or Am I Getting Dumber?” sharing, perhaps unknowingly,  details of what one has been reading whether, as in the case on Smartr, links to pages posted on Twitter or, in this case, Guardian articles, does raise interesting tensions related to sharing, openness and privacy.   It is perhaps surprising that the Guardian newspaper doesn’t seem to be unduely concerned about such issues, with the Guardian Facebook App FAQ simply stating:

Can everybody see what I “Read”?
The Guardian Facebook app is a “social reading” environment. Your Facebook friends will be able to see links to articles you have read within the Guardian app environment, and you will be able to see what they have been reading. We think this will help people discover content that they might be interested in.

Facebook’s Social Graph

I have recollections of attending a Linked Data session at the WWW  2010 conference and hearing from a senior Facebook developer about the technologies used in Facebook’s Open Graph Protocol. The response to the question “Why are you developing your own approach? Why aren’t you using RDFa?” was (I paraphrase) “We were unaware of RDFa until this conference. It seems cool – we’ll use it!“.

Last night on Twitter two Linked Data experts whom I follow seemed to be pleased with the news announced at thre Facebook F8 developer conference.  Manu Sporny, “Founder/CEO of Digital Bazaar. RDFa/RDF WebApps Chair @ W3C. Champion for art/science, distributed banking/commerce, @PaySwarm, JSON-LD, semantics and puppies.tweeted:

Facebook’s new OGP launch today uses RDFa 1.1 (developer docs): #rdfa #w3c

whilst Kingsley Idehen, “Founder & CEO, OpenLink Software, An Open Linked Data Enthusiast”, provided an interesting reweeet:

RT @aliriop: #Facebook #OpenGraph Seeks to Deliver Real-Time Serendipity . #SDQ #LinkedData

The Linked Open Data Graph has been used to demonstrate the growth and size of the Linked Data environment. However critics have argued that it shows that Linked data seems to be over-reliant on content provided by DBpedia. It will be interesting to see if the large-scale use of RDFa across Facebook will demonstrate the value of Linked Data and help to encourage take-up in other areas.

Implications for the Sector

On Twitter Linda Bewley commented last night:

My Facebook cynicism is balanced out by respect for their ability to innovate. Direct access to phone’s native app data = result!

Although the issues of privacy are still very relevant, as I highlighted in the case of the Guardian app, it does seem to me that there will be a need to reflect  on the potential for greater business uses of Facebook. I’ll be interested to if, over time, Facebook’s Timeline oculd have a role to play in enhancing the Bath Folk page.  And whilst this is a trivial example, Universities will no doubt be considering the implications of yesterday’s announcements in the support of their marketing activities. But who, I wonder, will be in a position to take advantage of the Collective Intelligence which Facebook will be gathering?

Posted in Facebook | 6 Comments »

Is It Now Time to Embed Use of Google+?

Posted by Brian Kelly on 21 September 2011

Is Google+ Dead?

Is Google+ dead?  Dan Reimold certainly thinks so. In a post entitled “Google+: Social Media Upstart ‘Worse Than a Ghost Town‘” he suggest that Google+ may “simply [be ] a social media step too far” and is now “worse than a ghost town“.  In his conclusions he reflects on his personal experiences as a Google+ user:

As it stands, my Circles are sparse. The stream of updates has basically run dry — reduced to one buddy who regularly writes. My initial excitement about signing on and inviting people to join me has waned. Nowadays, I apparently get tired just thinking about it. 

A similar discussion about the relevance – and perhaps sustainability – took place amongst some of my Twitter followers recently. It seems that some feel Google+ is irrelevant and others are pleased with what they claim is a failed Google service and are waiting for the Diaspora service to be launched. However, as I said in the Twitter discussion, I am not convinced by this argument.

Why Google+ May be a Slow-Burner

Lessons from Growth of Twitter

In January 2011 in a post on Evidence of Personal Usage Of Social Web Services I described how use of the Tweetstats service provided me with evidence of growth of my Twitter usage which contradicted the understanding I had at the time. I had thought that I was an early adopter of Twitter and had used if fairly consistently since my first tweet in January 2007. But the Tweetstats graph (illustrated) shows little use in 2007. It wasn’t until early 2008 that I started to use Twitter on a regular basis.  The gaps in graph in the early part of 12008 puzzled me initially until I came across a blog post in which I described how I had made intensive use of Twitter whilst attending the Museums and the Web 2008 conference.  It seems that, perhaps due to a glitch in Twitter or Tweetstats, no usage had been detected for a period of a couple of months, which included the time when I first start to use Twitter on a regular basis.

Looking back it seems that attending a conference abroad made me aware of the benefits which Twitter can provide during a conference and that I soon became aware of the additional benefits which can be gained by developing links with one’s professional network.

A few days ago Aaron Tay pointed out that:

Some technology rewards getting in early e.g Twitter (early accs get more followers) & some don’t e.g qrcode

The post he cited (on the Seth Godin blog) made the observation that:

Worth considering: The difference between a technology where getting in early pays dividends, and those that don’t. For example, having a website or a blog or a Twitter account early can help, because each day you add new users and fans.

QR codes, on the other hand, don’t reward those that get in the ground floor. You can always start tomorrow.

Seth pointed out a important advantage that early adopters of social networks can have – the ease of gaining the critical mass which may be needed in order for the service to provide value.  There is a danger that this may be construed as a suggestion that the numbers of followers alone is a key factor in having an effective social networking service – and seeking new followers simply to enhance one’s Klout or Peerindex ranking is an example of misunderstanding of the relevance of a critical mass. Rather than simply indiscriminately seeking to grow large numbers of followers it you are looking to use a social network for professional purposes there is a need for to be reach the critical mass across one’s peers.

I recently installed the Social Bros application which provides evidence of personal use of Twitter.  I used this recently to investigate the number of followers the people I follow on Twitter have.  As can be seem most of the people I follow have 100-500 followers, with significant numbers having 1,000-5,000 and 500-1,000 followers. In order to develop a community of this size it can be useful to be an early adopter so that one can stake a claim. The following influx of users will have to search for contacts, and, having spotted and made contact with you, you will be able to reciprocate, if  you so choose.

As described in a Wikipedia entry on the Network Effectsites like Twitter and Facebook [become] more useful the more users join“. But as well as users needing a critical mass and an understanding of the benefits of the service, there will also be a need for east-to-use tools. Initially I used the Twitter Web site but as I discovered from reading my early posts about Twitter,  I was using the Twhirl client around the time my Twitter use became embedded in my daily work routine.  The Tweetstats service I mentioned earlier also provides me with statistics on the Twitter clients I have used.  As can be seen Tweetdeck is now my preferred tool, with the usage statistics of the Web client primarily either reflecting, I suspect, my early use of the Web or use in Internet cafes.

Implications for My Use of Google+

What lessons might we learn from these reflections on how Twitter developed from claiming an id but making little use to finding valuable (and unexpected)  use cases which lead to the service being embedded in my professional life which can be applied to Google+?

Like, I suspect, many others of my peers I have claimed a Google+ account and have established contacts with people I know from both real world and online interactions (there are currently 116 people in my circles and 385 people who have included me in their circle).

Yesterday I found that Google+ accounts are now freely available to everyone, so the comment I have heard that Google+ is exclusive to the early adopters is not longer the case.

I also heard yesterday that Google+ have released APIs which should help in developing a richer environment of tools and services based around Google+ (in this case, use of Huddle) which, I feel, was valuable in Twitter becoming mainstream.

The Google+ service itself is becoming richer in functionality, with recent tweets from Aaron Tay alerting me to articles which describe how “Google+ Hangouts Go Mobile & Get More Collaborative” and explain “Why Google Plus Hangouts is the Killer App: Docs“.

It seems to me that it is now timely to explore ways in which Google+ may deliver benefits and also to gain an understanding of best practices including personal work flow processes.  Earlier this year I set up a daily blog which I used to keep notes and ideas.  I spotted using it after six months, partly because I felt I was getting little new from using a second WordPress blog.  However I’ve now made a decision to use Google+ as a middle ground between the (sometimes, as in this case, long) posts I publish on this blog and the conversations and  announcements which take place on Twitter.

Anyone else planning to make greater use of Google+? Or, like Dan Reimold, do you feel it’s a ghost town and is unlike to have a significant role to play?

Posted in Evidence, Social Networking | 9 Comments »

Will the Real Scott Wilson Please Stand Up, Please Stand Up

Posted by Brian Kelly on 20 September 2011

The Microsoft Academic Search Service

At the recent Science Online London (SOLO) 2011 conference I attended a session on the Microsoft Academic Search service.  There seem to have been a lot of developments to this service since I first signed up for it shortly it had been announced.  I couldn’t get a decent feel for the service on my Android phone at the session since it uses Silverlight which isn’t supported on my phone.  However the tweets for the session were curated using Storify and these resources have been embedded in a post on the Nature blog which includes Twitter summaries of the third breakout sessions.From these useful notes I find that:

  • Microsoft Academic Search has details of over 27 million publications (see tweet).
  • There is an expectation that there will be up to 200 million publications by next year, but the biggest flaw is the content (see tweet).
  • Users can edit the content in the database i.e. using  crowd sourcing to cleanup the data  (see tweet).
  • Co-author and citation graphs shows relations which connect people (see tweet).
  • There is an open API for Microsoft academic search (see tweet).
  • Currently there’s no real system for claiming authorship on Microsoft Academic Research (see tweet).

I was impressed by the functionality and user interface. But in addition I was also interested in the issues raised in the tweets listed above regarding claiming authorship of papers and using crowd-sourcing to enhance the quality of the content.  I will discuss these issues in this post.

Personal Experiences

Shortly after returning to my office I reviewed the information it had about my research papers. A summary for my papers is illustrated below.

Although I am aware of the papers I have published I hadn’t really looked at the statistical analysis of the papers.  But in addition to details of the numbers of citations and the G-Index and H-index scores, of particular interest was the information about the 37 co-authors of papers I have published during my time at UKOLN.

As illustrated the Microsoft Academic Search service allows you to view links between the co-authors. You can also produce a similar citation graph showing researchers which have cited your work.

In addition you can also view the degrees of separation between two researchers – and I discovered that I have co-authored a paper with Sebastian Rahtz who has co-authored a paper with Dame Wendy Hall who has co-authored a paper with Sir Tim-Berners-Lee.

Interesting stuff – and if you are primarily a researcher the information on links and relationships may be particularly significant.  But can we trust the information which is depicted in the diagram?

Can We Trust The Information?

When checking details of my co-authors I noticed a number of errors. In the bottom right hand corner of the screen shot I have placed four of my co-authors:

Scott Wilson: Is based at CETIS, University of Bolton but on the Microsoft Academic Search service is listed as working at the University of British Columbia and apparently has published 74 papers.

Stephen Dean Brown: Is based at De Montford University but on the Microsoft Academic Search service is listed as working at the University of Toronto and apparently has published 88 papers.

Richard Davies: Is based at University College London  but on the Microsoft Academic Search service is listed as working at the University of Sheffield and apparently has published 35 papers.

Lawrie Phippsa: Is based at JISC but on the Microsoft Academic Search service is listed with an incorrectly spelt surname (although he does have a correct identifier which lists him as being based at JISC and having published 10 papers).

Back in April 2011 I wrote a post in which I described What I Like and Don’t Like About about the IamResearcher service.  I can recall how, having signed up for the service, I had to assert the papers which I had written, merge papers which had been assigned to different variants of my name and delete those which were incorrectly assigned to me. However since access to the service is restricted to signed in users I wasn’t too concerned about the service.  The information held on the Microsoft Academic Search service, in contrast, is openly available and widgets are available which enable the information to be embedded elsewhere – and I have used this feature to include information about my papers on the UKOLN Web site.  But how should we address the problems caused by incorrect information which I have illustrated?

Maintainance Issues

A Researcher’s List of Publications

I have edited a number of the errors I found in my details – but there is one paper on One world, one web … but great diversity which is also listed again as One World, One Web … But Great Diversity. Despite having tried to merge these two papers a week ago, the item is still listed twice (and is locked form further editing).  It would appear that there is a bottle-neck in approving changes.  So although researchers should have a vested interest in ensuring that information is accurate and complete (after all the content of open services such as the Microsoft Academic Search should be harvested by Google and will thus enhance the visibility of the source content) this may not be easy to do, even if the authors are aware of the service and feel sufficiently motivated to correct any errors.

And since there are pressures from funding bodies to maximise awareness and impact of one’s research papers it would seem to be self-evident that researchers will be motivated to manage their content. But is this really true?  And even if researchers can be made aware of the potential benefits, will they feel the effort is worthwhile?

Additional Content

Services such as Microsoft Academic Search may provide automatically find the title and  authors for a paper. However managing this information might include:

  • Providing links to details for the correct author.
  • Providing full citation details for the papers.
  • Providing links to PDF versions of papers, if available.
  • Providing conference details for papers published at conferences.

It may be felt to be the responsibility of the lead author to support the dissemination of a paper in this way (as well as having responsibility for the content and ensuring the paper is submitted in time). But in addition to maintaining details of the papers and co-authors there is also the need to consider other information which may not be as easy to determine.  For example recently while looking to summarise details of UKOLN’s peer-reviewed papers I noticed that authors’ institutional details had been split across UKOLN/Bath and the University of Bath. There appears to be a need to  aggregate this information in order to provide an organisational view of our research outputs.

Such a departmental view may help to provide an insight into changing areas of research interests. The accompanying image, for examples, shows  the subject of UKOLN’s research publications over time. From this we can see a long-standing involvement in the areas of information retrieval and human-computer interfaces. However this picture is skewed by the not having all authors included under the same department (and the information not being updated despite changes made over a week ago).

Getting It Right

The cynic would blame Microsoft for the problems which I have identified, but I think this would be unfair.  I feel that the service does provide a very appealing interface which has advantages over Google Scholar, for example.

But what improvements are needed in order to enhance the quality of such services?

It seems to me that there are three main sources of information, each of which will have corresponding issues which will have to be addressed:

  • Information about the author:  This is information which we might expect the author to maintain  (name, contact details, host institution, previous employment, etc.)  However there will be a need for the author to be sufficiently motivated to claim their identity and maintain the information.  There will also be a question of trust.
  • Information about the author’s papers:  This is information which could be harvested from content provided by publishers, institutional repositories, etc. However, as has been illustrated, there will be a need to validate information which is harvested.
  • Information about the author’s institution: The host institution will have an interest in ensuring that the research outputs from its staff and research students are included.

It should be noted that there may be tensions between an individual’s and an institution’s view on such data. For example the outlier in the diagram shown above (a paper on “Becoming an Information Provider on the World Wide Web”  published in 1994) should be included in my list of publications (it was the first peer-reviewed paper I wrote). However at the time I was working at the University of Leeds so it should not be included as a UKOLN/University of bath publication.

We could regard the process of ‘getting it right’ to be primarily focussed on data modelling. But since the Microsoft Academic Search service involved automated harvesting of large volumes of data from a range of sources with an expectation that data cleansing will be carried out by ‘crowd-sourcing’ including the authors themselves there will be a need to consider the motivations for people to register for a system, check the information and be willing to update it.

For me important drivers for doing this include:

  • Updating data which is openly available as I would have a vested interest in ensuring that information about my professional activities is correct and up-to-date. (I have no interest in updating information held in the service as this is closed).
  • Having a richly functional, easy-to-use and visually appealing system which differentiates itself from other providers.
  • Allows me to update the information easily and quickly.  Note that having found that information which I have updated on the Microsoft Academic Search service has not been approved after a period of a week is a barrier for making any more updates to this system.

And although I may be willing to update the information about myself and my institution I am reluctant to correct errors about my co-authors. Although for example,  I know about the paper which Scott Wilson and I have co-authored and know that he is based at CETIS, I don’t know if he was based at Bolton or Bangor University when we wrote the paper. I also don’t know which papers written by Scott Wilson were written by the Scott I know and which one’s were written by the Scott Wilson who is based at the University of British Columbia.  Will the real Scott Wilson please stand up!

Posted in Identifiers | 5 Comments »

Sharing Job Information More Effectively

Posted by Brian Kelly on 19 September 2011

Vacancies in Institutional Web Teams

Orla Weir, head of Digital Strategy at the University of Salford, recently asked me for suggestions on place places to publish information about a number of vacancies which are available in the new central digital team at the University of Bath.  My initial suggestion was to use the website-info-mgt and the web-support JISCMail lists and, as can be seen from the list archives, the message, which is summarised below, has been sent to the 564 members of the website-info-mgt list and the 588 members of the web-support list:

Digital Communications Officer
Grade 7 – £29,972 – £35,788
The Central Digital Team is part of the Communications Directorate and leads and manages the digital engagement, visual standards and digital presence of the University. As a small team, it is responsible for the creation and implementation of Digital Strategy including governance, platforms, innovation, and best practice and is essentially the glue that sits across many of the university engagement tools and services. The remit of the team includes: CMS, Web, Social, eLearning (promotion and visualisation), eCRM, Mobile, SEO and URL strategy.
The Digital Communications Officer will be a critical part of this small team and will be responsible for managing the digital presence and engagement model for the University. This includes internal customer relationships, project leadership, implementation, content delivery and evaluation. The role is a hybrid role which requires both technical and communications skills. It also requires an appetite and enthusiasm for all things digital and a desire to be part of creating an exemplar digital engagement presence.
The purpose of the role is to implement and manage the University’s web presence and digital engagement including content and platform implementation, design, communications and measurement. The scope of the role includes the appropriate use of all digital platforms using truly multichannel integration consistent with the University Digital Strategy.
Closing Date – 21/09/2011

But in a era of social media and syndicated content might there be additional ways of making such information available to a wider range of potential applicants?  And might we not expect those working in institutional Web teams within higher education to be pro-active in looking oat communication channels which may provide  additional benefits, such as being able to reach out to potential applicants who may not be members of these two mailing lists?

Careers 2.0: the Stack Overflow Careers Site

Coincidentally a recent tweet from @psychemedia (Tony Hirst) asked':

Wondering if any HEIs ever post developer job ads to stack overflow careers site? #devcsi @briankelly

I had a look at this site and found that although a search for vacancies containing the string “Web” shows that there are currently 404 (!) positions currently available, carrying out a search for “University” in the UK results in only five hits, as illustrated.

However, as described in the FAQ, the Careers 2.0 service is intended for employers who are looking for programmers, which they suggest can have a role to play as “part of the process as the first technical interview. Instead of scheduling a screening call with a member of your technical staff, just have your staff review the candidate’s profile“.  The service, which provides information on 38,127 , is aimed primarily at developers who have contributed to the Stack Overflow service – although there is a job listing service which employers may be interested in using as a means of reaching out to developers who are users of Stack Overflow.

IWTB, the Institutional Web Team Blog Aggregator

UKOLN’s IWTB (Institutional Web Team blog aggregator) service was officially launched at the IWMW 2011 event held at the University of Reading on 26-27 July. The service aggregates blogs provided by those working in (or have close affiliations with) institutional Web teams.  The service can be used to help identify what one’s peers across UK’s institutional Web teams are doing and what they are communicating to their users.

A search for ‘vacancies’ shows that several vacancies were advertised in July on the University of Bath Web services blog, with the University of Essex also advertising vacancies in their team in July.

As will many social Web services, the blog aggregator will become more effective as the numbers of users grows. We will shortly be promoting use of this service more actively. For now I will give a reminder that an online form for submitting the URL for an institutional Web team blog can be accessed from the IWMW home page.

Harvesting RSS Feeds

In a post entitled Autodiscoverable Feeds and UK HEIs (Again…) Tony Hirst revisited the provision of auto-discoverable RSS feeds on institutional Web sites. Tony’s post listed a number of areas in which RSS feeds can add value which  included:

jobs: if every UK HEI published a jobs/vacancies RSS feed, it would trivial to build an aggregator and let people roll their own versions of

Tony’s post also that he had developed a  developed a Scraperwiki tool to find auto-discoverable RSS feeds on University home page, which builds on his previous work in this area which used a Yahoo Pipe.  The Scraperwiki tool now analyses the RSS feeds and the output from the tool provides listings of news feeds, event feeds, research information feeds, Twitter feeds, as well as for jobs feeds. A summary of the UK University home pages which provide autodiscoveable job feeds is given below:

Feed Title URL
Jobs at Bath
Great careers start here…
Great careers start here…
Great careers start here…
Great careers start here…
Edge Hill University Job Vacancies latest vacancies

The Future

In a recent guest post entitled Lend Me Your Ears Dear University Web Managers! Dave Flanders, a JISC Programme Manager which summarised work carried out in the “Linking You“ project at the  University of Lincoln. A survey of 40 Web sites across the domain (ten from each university group) was carried out in order to compare patterns of usage for URLs to key information sources.  The project found there were inconsistencies in the representation of information for graduates and undergraduates.  However there were also good conventions that have emerged across the sector. From this work the ‘Linking You’ project proposed a common set of URL syntaxes that could be used in principle across multiple corporate institutional Web sites.

The project outlined a number of benefits to the sector which can be gained from agreement on common URI practices, which included:

  • Provision of news feed aggregators: If we all knew where all the corporate news feeds were e.g. we could create a UK University News Aggregation Service where the sector could have their news published on demand, let alone text mining goodness and other filters for highlight key news developments across all higher and further education institutions.
  • A sector wide directory: Common information such as institutional policies, contact information, news, about, events, etc. could be aggregated into a searchable directory; useful to both the public and HEI data geeks.

I can’t help but feel that Universities (and institutional Web teams) which are early adopters of such practices may gain advantages.  The Web teams which highlight their vacancies in a Web team blog will be able to see the content surfaced to viewers of the IWMW service and content linked in from University home pages in ways which can be found by software will continue to of interest to developers who will be looking for institutional data. I wonder how long it will take before others start to follow the approaches taken at the Universities of Bath, Cumbria, St. Andrews and Edge Hill?

Posted in Web2.0 | 1 Comment »

What’s On The Technology Horizon? Implications for Librarians

Posted by Brian Kelly on 15 September 2011

JISC Observatory’s Horizon Scan

As described on the JISC Observatory blog the JISC Observatory is a “JISC-funded initiative to systematise the way in which the JISC anticipates and responds to projected future trends and scenarios in the context of the use of technology in Higher & Further Education, and Research in the UK“.

The JISC Observatory is the first major collaboration between Cetis and UKOLN in their role as JISC Innovation Support Centres. A recent post on the JISC Observatory blog described how the JISC Observatory team commissioned a study by the New Media Consortium (NMC).  The report was launched during the ALT-C 2011 conference. The report, “Technology Outlook: UK Higher Education” is now available on the NMC Web site (in PDF format, 24 pages). This report is part of the NMC’s series of widely-read Horizon Reports which provide a series on annual reports in technology trends which date back to 2004.

The Technology Outlook report explores the impact of emerging technologies on teaching, learning, research or information management in UK tertiary education over the next five years, as identified by the Horizon.JISC advisory board: a group of experts comprised of an international body of knowledgeable individuals, all highly regarded in their fields representing a range of diverse perspectives across the learning sector. The methodology taken  by the Horizon.JISC advisory board is described on the Horizon Project | JISC Observatory Wiki. The work includes monitoring appropriate press clippings, identifying key trends, discussing and then refining the trends and critical challenges before a voting process to seek consensus.

Implications for Librarians

Next month I will be speaking at the Internet Library International ILI 2011 conference. The conference takes place in London on 27-28 October 2011 and I’ll be talking with Åke Nygren, Stockholm Public Libraries in the opening session of the Technology Developments and Trends track on the topic on “What’s on the Technology Horizon?

Rather than having to come up with my own thoughts on new technological developments relevant to the library sector, I will be summarising some of  the predictions which have been made in the Technology Outlook report and, in the  15 minutes available to me, discuss the implications of these developments for information professions. In addition to summarising the key predicted developments I’d like to provide examples of early adopters within the sector.  If you have been involved in development work in the areas listed below feel free to let me know, either in a comment on this blog or my email, and I’ll see if I can include the example in my presentation.

Time-to-Adoption Horizon: One year or less:

  • Cloud Computing
  • Mobiles
  • Tablet Computing
  • Open Content

Time-to-Adoption Horizon: Two-three years

  • Learning Analytics
  • Semantic Applications
  • New Scholarship
  • Semantic Applications

Time-to-Adoption Horizon: Four-five years

  • Augmented Reality
  • Collective Intelligence
  • Telepresence
  • Smart Objects

And whilst I’m happy to hear about libraries which may be nmaking use of mobile devices and tablets or using Cloud Services, I’d be much more interested to hear of library uses of Augmented Reality, Collective Intelligence, Telepresence or Smart Objects!

Posted in Events, jiscobs | Tagged: | 4 Comments »

Bath is the University of the Year! But What if Online Metrics Were Included?

Posted by Brian Kelly on 14 September 2011

University of the Year

For the first time in a long, long time last weekend I bought the Sunday Times.  The reason for this was to read the Sunday Time’s announcement that the University of Bath has been identified as the University of the Year.

As someone who has worked and lived in Bath for almost 15 years I was very pleased with the news – but not as pleased, I suspect, as the Vice-Chancellor and members of the University’s Press Office which, of course, published a University news item with details of the announcement which informed us that:

The University of Bath has been awarded the title of ‘University of the Year 2011/12’ by The Sunday Times, one of the most prominent and influential newspapers in the world.

The news item went on to highlight another metric:

In that league table the University of Bath has risen to 5th out of 122 UK universities and colleges – its highest ever position.

Last Friday I viewed a video clip in which the Vice-Chancellor announced the news and as I left campus on Friday evening I noticed the posters which were scattered around the University Parade informing potential students (and their parents) which would be visiting the campus on the following day for the University Open Day  of what a great University they are visiting.

Being identified as the top University by (ahem) “one of the most prominent and influential newspapers in the world” is clearly deemed important by the powers that be at the University. And, in addition, several people I follow on Twitter who don’t work in marketing positions also tweeted the news.

What If Online Metrics Also Counted?

Yesterday Sheila MacNeill, Assistant Director at JISC CETIS, alerted me to a Mashable article which asked “How Digitally Connected Are the U.S. News Top 20 Colleges?”. The article referred to a  U.S. News list of top ranking national universities and national liberal arts colleges which appears similar to the Sunday Times survey. The Mashable article described how they:

decided to add another factor for review: social media connectedness. Below you’ll find top 10 lists of universities and liberal arts colleges alongside an analysis of their social media presences

This puts Harvard in equal first place with 66,737 Twitter followers, 698,933 Facebook likes and 390 YouTube videos and 27,786 subscribers. Harvard tied with Princeton University which had 15,572 Twitter followers, 52,125 Facebook likes and 164 YouTube videos and 2,978 subscribers. The positions in this league table seem to have been based on an undocumented weighting of the social media metrics.

Lies, Damned Lies and Social Media (and Other) Analytics

It is easy to dismiss the Mashable article as trivia, statistically flawed or dangerous, depending on your particular take. But can’t the same criticisms be made of the Sunday Times league tables?

Since the Sunday Times article is hidden behind a pay wall (and I’d left my copy at home) I subscribed to the Times / Sunday Times service in order to read about the methodology they had employed (note to self, cancel the Direct Debit payment before the full payment is due!).

The methodology (which is summarised here) states:

Universities were ranked according to marks scored in nine key performance areas.

Teaching excellence (250 points): The results of questions 1 to 12 of the 2011 national student survey (NSS) are scored taking a theoretical minimum and maximum score of 50% and 90% respectively. …

Student satisfaction (+50 to =55 points): The responses given to Question 22 of the National Student Survey: “Overall, I am satisfied with the quality of the course” were compared to a benchmark for the given institution, devised according to a formula based on the social and subject mix. …

Peer assessment (100 points): Academics across all institutions included in our guide were asked to rate departments in their subject field on a five-point scale for the quality of their undergraduate provision and a figure was awarded to each institution based on coverting (em>sic</em>) the average score for each institution on to a 100-point scale. …

Research quality (200 points): We used data from the most recent research assessment exercise, published in December 2008. Five different ratings were awarded for research quality, ranging from 4* to unclassified, from which we calculated an average score per member of staff entered for assessment. This average score was converted to a percentage and double weighted to give a score out of 200.  …

A-level/Higher points (250 points): Nationally audited data for the 2009-10 academic year were used for league table calculations. All entry points gained under the Ucas tariff system were used to calculate mean scores for all universities. Grades for leading qualifications were awarded points according to the following scale: A-levels – A*: 140; A:120, B:100, C:80, D:60 and E:40; AS-levels – A:60, B:50, C:40, D:30, E:20; Advanced Highers – A:120, B:100, C:80; Highers – A:72, B:60, C:48.  …

Unemployment (200 points): The number of students assumed to be unemployed six months after graduation was calculated as a percentage of the total number of known destinations. This is shown as a percentage in each profile. For the league table calculation, the percentage was subtracted from 50. …

Firsts/2:1s awarded (100): We calculated the percentage of students who graduated with firsts or 2:1 degrees.  …

Dropout rate (+57 to -74 points): The number of students who drop out before completing their courses was compared with the number expected to do so (the benchmark figure shown in brackets in the university profiles). Benchmarks vary according to subject mix and students’ entry qualifications. The percentage difference between the projected dropout rate and the benchmark was multiplied by five and awarded as a bonus/penalty mark. Universities that lost fewer students than their benchmark gained, those losing more had points deducted. …

Hmm. Are the ways in which the individual scores are compiled and then the scores for the nine categories aggregated significantly different from the way in which social media analytic companies such as Klout and Peerindex determine their scores (and which I summarised in a post on Social Analytics for Russell Group University Twitter Accounts)?

Doesn’t it seem likely that we will see the Sunday Times survey of UK universities in future years include analyses of universities’ online presence?

And won’t this be treated as important by those involved in University marketing and student recruitment, despite the limitations such methodologies may have?

Posted in Evidence | 9 Comments »

“Battling legal, logistical and technical obstacles to archiving the Web”

Posted by Brian Kelly on 12 September 2011

Recent Features on Web Archiving

The recent guest blog post entitled Web archives: more useful than just a ‘historical snapshot’ was quite timely, having been published a few days after a related article in the Time Higher Education (Memory Failure Detected) which described how:

A coalition of the willing is battling legal, logistical and technical obstacles to archive the riches of the mercurial World Wide Web for the benefit of future scholars

The article went on to illustrate a use case from the preservation of Web resources:

It is 2031 and a researcher wants to study what London’s bloggers were saying about the riots taking place in their city in 2011. Many of the relevant websites have long since disappeared, so she turns to the archives to find out what has been preserved. But she comes up against a brick wall: much of the material was never stored or has been only partially archived. It will be impossible to get the full picture.

But, as I describe below, we don’t need to wait until 2031 to have a reason to analyse Web content which may have been thought to be ephemeral.

Analysis of Twitter Usage at Recent ALT-C Conferences

The article in the Times Higher Education referred to an archiving initiative led by the Library of Congress which is archiving Twitter posts which will allow, at some time in the future, researchers to analyse public tweets. The article could also have mentioned the TwapperKeeper  archiving service which benefitted from JISC-funding to enhance its archiving capabilities to address requirements of the UK HE’s sector. The TwapperKeeper service was used to keep an archive of tweets posted about last week’s ALT-C 2011 conference.  The JISC-funded developments to the service included the provision of enhanced API access which led to development of the Summarizr analysis service  by Andy Powell at Eduserv.

In order to make valid comparisons across annual events I have previously suggested that the Twitter traffic for a week is analysed, so that discussions in advance of an event and shortly afterwards can be analysed. The Summarizr statistics for tweets at the ALT-C conferences for the past three years are given in the following table.

Note: Following the publication of this post Martin Hawksey pointed out in a comment on the post that the Twapper Keeperr archive was not available at the start of the ALT-C 2011 conference, until he created the archive on the opening morning of the conference.  An updated column has been published, but note that this does not include tweets form the opening morning of the conference.

ALT-C 2009 ALT-C 2010 ALT-C 2011 ALT-C 2011 (updated)
Date of event 8-10 Sept 2009 7-9 Sept 2010 6-8 Sept 2011 6-8 Sept 2011
Dates for analysis 6-12 Sept 2009 5-11 Sept 2010 4-10 Sept 2011
(partial archive)
6-11 Sept 2011
Nos. of tweets 4,442 6,138 6,296 6,342
Nos. of users 726 658 802 809
Nos. of URLs tweeted 701 664 1,083 1,102
Top five twitterers jamesclay (168)
sputuk (113)
haydnblackey (112)
emmadw (110)
JackieCarter (97)
dajbconf (330)
timbuckteeth (279)
AJCann (174)
jamesclay (153)
jak82 (111)
digitalfprint (327)
timbuckteeth (212)
sarahhorrigan (187)
FieryRed1 (165)
kevupnorth (140)
digitalfprint (327)
timbuckteeth (217)
sarahhorrigan (187)
FieryRed1 (165)
amcunningham (141)
Top five tweeted hashtags altc2009 (4,333)
jisccdd (108)
dubaimetro (84)
wheniwaslittle (72)
dupedb (64)
altc2010 (6,089)
digilit (173)
awesome (25)
altc2011 (24)
fail (23)
altc2011 (6194)
ds106radio (54)
altc2012 (42)
oer (39)
opencountry (35)
altc2011 (6,240)
ds106radio (54)
altc2012 (42)
oer (39)
opencountry (35)
Nos. of geo-located tweets 0 (0%) 35 (0%) 83 (1%) 83 (1%)

Archiving of the tweets allows us to provide such analyses in order to see the importance of Twitter at such events and identify the people who are particularly active Twitter users at the events. The figures also suggest that the amount of Twitter traffic seems to have stabilised over the past two years and the geo-located tweets, although growing in numbers, is not yet being used to any significant extent.

The Coalition of the Willing – Should Include You

The article published in the Times Higher Education highlighted a number of examples of  initiatives designed for archiving the broad ranges of resources available on the Web, including work being undertaken at the British Library, the Library of Congress and the Internet Archive as well as a number of national libraries in Europe.

The emphasis of national and international organisations may lead to the impression that archiving of Web resources is being addressed by others and so there is no need for individual universities to need to consider web preservation issues. This is, I feel,  a mistaken view.  Indeed not only should those who have a responsibility for the management of institutional digital resources need to address preservation issues, so too do those who manage project resources as well as, as we have seen above, those who may wish to preserve content associated with events.

JISC has recognised the importance of Web archiving and will be hosting an event on “The Future of the Past of the Web” which will be held at the British Library Conference Centre on 7 October 2011. This free event is the third joint Web archiving workshop which has been organised by the JISC in conjunction with the British Library and the DCC. The event is aimed at:

  • Curators, librarians, archivists interested in the preservation of web resources
  • Organisations that are engaged in web archiving and digital preservation
  • Researchers who depend on access to stable web resources for their research
  • Web developers and content creators who value their content
  • Information managers with responsibility for legal compliance

If this event is of interest to you note that bookings should be made before 12:00 on Friday 30th September 2011.

Posted in Events, preservation | Tagged: | 4 Comments »

Microattributions, Wikipedia and Dissemination

Posted by Brian Kelly on 9 September 2011

Microattributions Session at #SOLO11

One of the sessions I attended at the SOLO (Science Online London) 2011 event held in London last week addressed the role of ‘microattributions’ in science (note that there isn’t a specific page on the SOLO11 Web site which I can link to so I have created a Lanyrd page about the Microattributions breakout session).

Use of Microattributions in Wikipedia

The session began with Mike Peel (@Mike_Peel) showing how contributions to Wikipedia provided an example of a service which supports microattributions. Looking at an example which I am familiar with, a year ago in a post entitled How Can We Assess the Impact and ROI of Contributions to Wikipedia? I commented on the potential value of entries in Wikipedia with the example of Andy Powell’s update to the HTTP_303 entry. This entry has been viewed no fewer that 5,032 times in the past 30 days which I think illustrates Wikipedia’s strengths in providing outreach. However I hadn’t been aware that it was possible to view details of the contributions made to Wikipedia articles. Looking at the list of contributors for the HTTP_303 entry I find that Andy Powell is the top contributor, having made 7 updates – between 09.53 and 10:13 on 24 September 2010.

Looking at a more significant article, such as the Wikipedia entry for World Wide Web, we can see that the top contributor, Susan Lesch, has made 253 edits between March 2008 and July 2011. The next most prolific contributor, NigelJ, has made 127 updates followed by the Cluebot bot, which has made 70 automated updates (fixing vandalised updates to the article).

Mike Peel illustrated the importance of being able to identify significant contributors to Wikipedia in a story of Professor Gets Tenure With The Help Of His Wikipedia Contributions. The Wikimedia blog provided further information on the contributions which Professor Michel Aaij had made: “more than 60,000 edits, a couple of Good Articles, a Featured List, almost 150 Did You Knows“.

Microattributions in Scientific Research

Following Mike Peel’s very tangible example of both use of microattributions and the value that they can provide for an individual, Martin Fenner (@mfenner) described the origin of the term. As Martin described in a recent blog blog one of the first mentions of the term appears to be an August 2007 Editorial in Nature Genetics (Compete, collaborate, compel). Martin provided a definition of the term:

Microattribution ascribes a small scholarly contribution to a particular author.

and went on to describe how a paper published in March 2011 in Nature Genetics (Systematic documentation and analysis of human genetic variation in hemoglobinopathies using the microattribution approachconcluded that “microattribution demonstrably increased the reporting of human variants, leading to a comprehensive online resource for systematically describing human genetic variation“.

A Microattribution Article in Wikipedia

During the Microattributions session we heard of several other examples of microattritibutions including contributions to source code on software repositories such as Github.

During the session Mike Peel updated his personal page on Wikipedia with some of the ideas which were discussed. On the page Mike pointed out that there wasn’t a Wikipedia entry on Microattributions and invited volunteers to create a page.

I responded to this challenge and created the initial stub entry for the article, as illustrated.

In my initial draft which, following the suggestion provided by the article creation wizard, I created in my personal Wikipedia space, I included the other examples of microattributions which I mentioned above. However since I wasn’t aware of any significant publication which had documented use of the term in these contexts I defined microattributions in the context of its use in the Nature Genetics paper.

Making Use of Wikipedia in Other Areas

I don’t know if the Microattributions will remain in Wikipedia. It might be deemed to be not sufficiently note-worthy. Or perhaps it could be included in some other entry: what, for example is the relationship between a microattribution and a nanopublication – a term coined, I think, by Barend Mons.

However I am convinced of the importance of Wikipedia for defining scientific and technical terms and documenting significant issues related to their origin and use. Should funders, such as Research Councils and JISC, encourage funded projects to make use of Wikipedia as a dissemination channel which can help to enhance the impact of funded work? If this does happen there will be a need to understand best practices for creating and maintaining sustainable items in Wikipedia, including concepts such as NPOV.

I also feel it would be useful to be able to monitor contributions to Wikipedia across sectors, such as JISC-funded project developments. Although it seems that we can identify individual contributors I don’t know if it is possible to aggregate information related to groups of individuals. Since myself and Andy Powell both have profiles in Wikipedia, is it possible, I wonder, for statistical information about our contributions to be automatically gathered and analysed? I’ll leave that as a challenge to developers :-)

Twitter conversation from Topsy: [View]

Posted in Wikipedia, Wikis | 2 Comments »

Guest Post: Web archives: more useful than just a ‘historical snapshot’

Posted by Brian Kelly on 7 September 2011

In this guest blog post Maureen Pennock, the Web Archive Engagement & Liaison Manager at the British Library, explores some possible approaches to exploiting the scholarly value of web archives.

Web archives: more useful than just a ‘historical snapshot’

The importance of the internet for research is well-known. As a constantly growing and evolving information source, the web contains vast amounts of information not available or published elsewhere. It is also a unique record of life and society in this technological age. Rarely these days do scholars carry out their research without going online, and the research value of the web is undeniable.

Web archives seek to capture this value and uniqueness by harvesting websites so that they may be re-used in the future even when they are no longer available on the live web. Over the past decade, numerous web archives have been established and grown, including the UK Web Archive. At almost 10 terabytes, over 9,300 web sites and 38,000 instances of archived sites, the UK Web Archive is a unique selective web archive that reflects the collection policies of the participating institutions.

Use of the web archive is steady. However, as recent reports have identified, there remains a gap between the potential community of researchers who could exploit the content, and those who actually do so. To address this, we are collaborating with researchers to explore different ways in which they may use the web archive and exploit the data contained within. We have developed and released a number of visualisation tools as an early first step:

  • the 3D Visualisation Wall, (shown below) which provides a high-level, more dynamic presentation of search results and special collections;
  • the N-Gram search, which encourages users to consider the web archives as data as well as websites, enabling visualisation and comparisons of term frequency;
  • the General Election 2005 Tag Cloud, which visualises the most frequently used (single and pairs of) words in the websites related to key political parties during the 2005 election campaign.

Analysis shows that our single most popular site is the One & Other site, otherwise known as the Fourth Plinth, the website of a 2009 public arts project by artist Anthony Gormley. The site is no longer available on the live web. This type of usage, where users browse websites in order to access content that was available at a given point of time but is no longer accessible, is a widely accepted, original user scenario. It is based largely on original user experiences and early interactions with the live web. But there are other ways in which a web archive may be used, aside from visiting sites as they were captured at a given date and time. For example:

  1. Resource citation. Researchers typically use the live web for research and cite live web resources with the date last visited. Why? Because content changes over time and they want to indicate when the content was available on the website. But if the content changes – and web pages are frequently updated or refreshed without archiving old versions – then there is no proof that the content cited actually existed. The web archive provides a more reliable and persistent citation than the live web.
  2. Data exploitation. Web archives enable automatic identification of social trends over time (automated temporal trend research). The tools available will impact on the type of research that can be undertaken. This is a chicken & egg scenario: we rely to an extent on users to tell us what tools they want, but users need some direction on what might be possible with the data available. We need to work together to further develop the archive and support the emerging research needs of our users.
  3. Intelligent querying, of the Q&A sort. Given the amount of data available in the web archive, it’s not inconceivable that future users will expect a more intelligent query mechanism than simple search and result presentation. More complex questions, for example, ‘tell me about the competing interests of oil companies in the late twentieth century’ are the stuff of sci-fi but rely upon an extensive historical database – such as a web archive.

Of course the characteristics of a web archive inevitably impact on how viable these different scenarios may be. For example, a selective web archive with limited scope but rich resource description will support research differently to a broad domain or international archive, with minimal accompanying metadata. The age of the web archive may be another factor. These factors must be recognised when developing tools and functionality.

Increasing usage and responding to researcher needs is an important element of our growth strategy for the UK Web Archive over the next five years. If you use the web archive for research and/or have ideas about tools or functionality to support specific types of research, we’d really like to hear from you. You can get in touch with us either by email, on Twitter, or by leaving a comment below.

Contact Details

Maureen Pennnock
Web Archive Engagement & Liaison Manager
The British Library (Yorkshire)

Twitter: @mopennock

Posted in Guest-post, preservation | 2 Comments »

Recognising, Appreciating, Measuring and Evaluating the Impact of Open Science

Posted by Brian Kelly on 6 September 2011

The #SOLO11 Conference

As I mentioned in yesterday’s post on Use of Twitter at the SOLO11 Conference on Friday and Saturday, 2-3 September 2011 I attended the Science Online London 2011 event, SOLO11.

We are now starting to see various posts on the event being published. One of the first reports on the events was written by Alexander Gerber and published on the Scienceblogs service based in Germany. Alexander began his brief post by saying:

My sobering conclusion after two days of ScienceOnline London: The technologies are ready for take-off, the early-adopter-scientists are eager to kickstart the engine, but the runway to widespread usage of interactive technologies in science is still blocked by the debris of the traditional academic system. This system needs to be adapted to the new media paradigms, before web 2.0 / 3.0 can have a significant impact on both research and outreach. 

and went on to list three central questions which he feels need to be answered:

  • How can we recognise, appreciate, measure and evaluate the impact of outreach and open science in funding and evaluation practice?
  • Which new forms of citation need to be installed for that?
  • How can we create a reward system that goes way beyond peer-reviewed citations?

I’d like to address certain aspects of the first question, in particular ways in which one might measure and evaluate the use of social media to support such outreach activities since this issue was discussed during a workshop session on Online Communication Tools which I spoke at.  However I would first like to give some thoughts on the opening plenary talk at the event.

Plenary Talk on Open Science

For me the highlight of SOLO11 was the opening plenary talk on “Open Science” which was given by Michael Nielsen, a “writer; open scientist; geek; quantum physicist; writing a book about networked science“.

A number of blog posts about the event have already been listed in the Science Online wiki. I found Ian Mulvany’s thoughts on the Science Online London Keynote talk particularly helpful in reminding me of the key aspects of the talk.

Michael told the audience that he didn’t intend to repeat the potential benefits of open science; rather he would look at some examples of failures in open science approaches and then look in other disciplines to see if there were parallels and strategies which could be used in the science domain.

The example given described use of open notebook science in which a readership of ~100 readers in a highly technical area had been established, but there was little active participation from others.  The author, Tobias J Osbourne, was putting in a significant amount of effort but was failing to gain value from this work.

Michael gave an example of how a significant change can be made in a short period of time which brought significant benefits: the change to driving on the right hand side of the road in Sweden at  5am on Sunday, 3 September 1967.

However although this example was successful and brought benefits (such as reduced costs) there are many other examples in which the potential benefits of  Collective Action fail to deliver, often due to some potential beneficiaries chosen to ‘freeload’ on the work of others.

We can learn from examples of successes in other areas, ranging from the establishment of trade unions and well-established practices for managing water supply in villages through to the growth of the ArXiv archive and of the Facebook social networking service.  Successful approaches include:

Starting small: For example the ArXiV service success was due to it focussing on a small subject area. Similarly Facebook was initially available only to students at Harvard University, before expanding to, initially, other Ivy Leagues and then other higher educational institutions before being available to everyone.

Monitoring and sanctions: Michael concluded by describing how there was a need to monitor use and, if needed, to be able to apply sanctions.

The concept is that there is some action where if everyone changed it would be better for everyone, but you need everyone to change at the same time. There are incentives for people not to participate because there is some cost involved in changing for the individual but if the individual does not change, they get the benefit anyway from everyone else changing. This is the same kind of problem that we have with the move to open data.

In brief, therefore, Michael felt that those who feel that open science can provide benefits tend to be too ambitious – there is a need to start with small achievable aims and to make use of approaches for broadening the scope using various approaches which have proven successful in other areas.

Analytics for Use of Social Media

The second day of the SOLO 11 event provided a series of workshop sessions.  I attended one which was billed as Scholarly HTML but it fact provided an introduction to blogging on WordPress :-(  However a workshop session on Online Communication Tools which provided an introduction to Twitter, Google+ , etc in the morning moved on in the afternoon sessions to:

… cover all angles from how to practically use the tools most beneficially in an institutional or academic environment, to how to measure their impact via statistics and online “kudos” tools

Alan Cann, one of the facilitators of the session, invited me to speak in this session as Alan had attended a one-day workshop on “Metrics and Social Web Services: Quantitative Evidence for their Use and Impact” which I organised recently. I used the slides from a talk on “Surveying Our Landscape From Top to Bottom” which reviewed various analyses of use of social media services by individuals and institutions, including tool such as Klout, PeerIndex and Twitalyser.

Alan Cann also spoke in the session and in his presentation pointed out the statistical limitations in using such services – similar concerns to those made by Tony Hirst in a talk on which he gave at the  “Metrics and Social Web Services: Quantitative Evidence for their Use and Impact” event.

Tony’s slides, which are available on Slideshare, illustrated dangers of misuse of statistics including the accompanying graphs  showing data which can all be, incorrectly, reduced to the same linear curve.

Tony went on to describe Goodhart’s Law which states that:

once a social or economic indicator or other surrogate measure is made a target for the purpose of conducting social or economic policy, then it will lose the information content that would qualify it to play such a role.

and Campbell’s Law:

The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.

Lies, Damned Lies and Social Media Analytics?

Might, therefore, we conclude that social media analytics tools such as Klout, PeerIndex and Twitalyzer have no role to play in, for example, “measuring and evaluating the impact of outreach and open science“? Not only are, for example, the ways in which Peerindex aggregates its scores for authority, activity and audience to give a single value statistically flawed, but, if such services are used for decisions-making purposes we will see users gaming the system.

Whilst this is true, I also feel that there are dangers in trying to develop a perfect way of measuring such impact – and it was clear from the workshop that this is an acceptance of the need for such measurements.

There will be many other examples of approaches to measurements which we generally accept but which have underlying flaws. The university system, for example, may be regarded as evaluating its successful consumers as first, two-one, two-two or third class degree students.  But despite the limitations of assessment the importance of such assessment is accepted.

We might also wish to consider how such measuring schemes are used.  The approaches taken by Klout and Peerindex have parallels with Google’s ranking algorithms – and again can be gamed. But organisations are prepared to invest in ways of  gaining high Google rankings since this will provide business benefits, through Web sites being more easily found in Google searches.

We are starting to hear of examples of Klout and Peerindex statistics being used  in recruitment, with a recent article published in the New York Times inviting readers to:

IMAGINE a world in which we are assigned a number that indicates how influential we are. This number would help determine whether you receive a job, a hotel-room upgrade or free samples at the supermarket. If your influence score is low, you don’t get the promotion, the suite or the complimentary cookies.

I suspect that marketing departments will use such statistics and that people working in marketing and outreach activities will start to use personal social media analytic scores in their CVs. Note that as can be seen from the image which shows my Peerindex scores such tools can be used in a variety of ways – it is clear that you wouldn’t employ me based on the diagram if you were looking for someone who had demonstrable experience in outreach work using Twitter in the field of medicine (my areas tend to focus on technology, sport and politics).

I therefore feel that we should treat social media analytics with care and use them in conjunction with qualitative evidence of value. But to disregard such tools completely whilst waiting for the perfect solution to appear will fall into the trap which Michael Nielsen warned against, of seeking to gain broad acceptance of a universally applicable solution.

I’d welcome your thoughts.

Posted in Evidence, Impact | 2 Comments »

Use of Twitter at the SOLO11 Conference

Posted by Brian Kelly on 4 September 2011

SOLO11: the Science Online London Conference

On Friday and Saturday, 2-3 September 2011 I attended the Science Online London 2011 event, SOLO11.  This event was launched in 2008 with a focus on science blogging. I attended the second in the series (and published a post entitled The Back Channels for the Science Online 2009 Conference) by which time the event had broadened in scope to address a wider range of issues of interest to scientists and researchers, those involved in journal publishing and those involved in science communication. In  light of the popularity of the event last year the event moved from the Royal Institution to the British Library which enabled up to 250 people to attend, double the previous capacity.  Unfortunately I couldn’t attend last year’s event but I was pleased that I was able to get to the event this year.

Use of Twitter at SOLO11

I’ll not comment on the talks and sessions at the SOLO 2011 conference – I suspect we will see a lot of detailed posts about the event over the next few days, particularly since the event will have attracted those who are pro-active in making use of blogs, Twitter, etc. Rather I’ll provide some comments on metrics of the event’s use of the #solo11 Twitter event hashtag.

Tony Hirst (@psychemedia) has already provided a visualisation of the #solo11 Twitter community and this image is embedded in this post.

In addition to the various tools Tony uses to produced such visualisations the TwapperKeeper service is increasing being used to keep archives on event tweets with the Summarizr service providing a statistical summaries of usage.

We can view the Summarizr statistics for the #solo11 tag. But how might we go about making comparisons of Twitter usage with previous SOLO events?

Although not very well documented it is possible to restrict a Summarizr analysis to a particular date range. In a blog post on Conventions For Metrics For Event-Related Tweets I pointed out that in order to make valid comparisons between the use of Twitter at events there will be a need to use comparable date ranges. The Summarizr tool can therefore provide comparable statistics for the SOLO10 and SOLO11 events (note that the SOLO09 event only lasted for one day, with an evening event the day before):

SOLO11 (2-3 Sept 2011)
Summarizr stats for 2 days for 2 full day event: There were 2,132 tweets from 413 users. There were a total of 114 hashtags and 120 URLs tweeted. There were 41 geo-located tweets (1% of the total).
SOLO10 (3-4 Sept 2010)
Summarizr stats for 2 days for 2 full day event: There were 2,148 tweets from 410 users. There were a total of 96 hashtags and 140 URLs tweeted. There were 28 geo-located tweets (1% of the total).
SOLO09 (22 August + evening event on 21 August 2009)
Summarizr stats for 2 days for 1 full day and 1 evening event: There were 72 tweets from 46 users. There were a total of 5 hashtags and 20 URLs tweeted. There were 0 geo-located tweets (0% of the total).
Science Blogging 08 (1 July 2008)
No TwapperKeeper archive of tweets available.

We can therefore see that Twitter usage for SOLO10 and SOLO11 seems to be at fully similar levels.

What Else Do We Need?

At events such as SOLO we can expect to see intensive use of Twitter. The participants and organisers are also likely to have an interest in how Twitter was being usage and the impact which its use may have had. In order to carry out subsequent analyses there will be a need to have an archive of tweets. There will also, as the scientists who attend the event will be aware of, be a need to ensure that analyses are carried out in  a reproducible and consistent fashion.  In addition there will be a need for various analysis and visualisation tools.

Are we in a position in which the data capture processes, tools and methodologies for analysis and interpretation are available in a systematic way?  I’d welcome feedback from those who attended SOLO11 and the wider community. For me there seems to be a failure in the lack of a consistent URI to refer to SOLO conferences – how do I cite the SOLO10 event, for example?


After publishing this post it occurred to me that there may be both individual and organisational benefits for being able to analyse SOLO event tweets.  During the event I spoke to Martin Fenner and Lou Woodley and, after realising that we had shared interests, started to follow them on Twitter.  There were other people I followed during the event, but I can’t remember who they were.  It occurs to me that it would be interesting to be able to record details of people one starts to folow at events, especially if this leads to subsequent significant joint work (as I described in a post on 5,000 Tweets On Twitter has led to contributions to a joint paper including one which won an award for the Best Communication Paper at W4A 2010).

From an event organiser’s perspective it would be interesting to gather evidence of growth of networks from a broader perspective. Would it be possible, I wonder, to see how the Twitter networks for participants at an event develop over the duration of an event and might it be possible to relate this to more tangible evidence of impacts or other benefits?

Posted in Events | Tagged: | 2 Comments »