UK Web Focus (Brian Kelly)

Innovation and best practices for the Web

Extending Your Community – Through Machine Translation

Posted by Brian Kelly on 29 Dec 2009

Out Of Sight, Out of Mind?

It was over ten years ago, when I was the project manager for the EU-funded Exploit Interactive ejournal that I first started to explore the potential of machine translation. Could we, I wondered, make use of Web-base language translation services to translate articles published in English into other languages?

“Nonsense!” was a response I encountered. “Computer translations won’t work” and I was told the story of how a computer translated the phrase “out of sight, out of mind” to one language and when it was translated back into English it came out as “invisible idiot”.

Now although I can appreciate the difficulties of translating idioms, my interest was not in the translation of full-text articles but automated translation of the summary of the articles. However European colleagues were sceptical of such automated translations and so, after some experiments with the BabelFish translation service, we did not pursue this.

What Are They Saying About Me In Catalonia?

Since my involvement as project manager for Exploit Interactive and its successor Cultivate Interactive, I have given little thought to language issues. And I keep up-to-date with developments by reading English-language resources, these days typically UK, US, Canadian and Australian blog posts and tweets, with the occasional English-language peer-reviewed article.

From time to time, however, I notice links to my blog from non-English posts. And just before Christmas I noticed an incoming link from a blog post entitled “Tots som tweets (a la universitat)“.

That will be a post citing one of my articles about Twitter, I thought, and visited the post to see what it said:

Un article clau, que no deixa indiferent, és el de Brian Kelly al seu blog UK Web Focus: “I Want To Use Twitter For My Conference” on exposa bones pràctiques en l’ús de twitter per organitzar un congrès o conferència. Les entrades de Kelly són molt rellevants i es tracta d’un blog que trobo de seguiment obligat, igual que Mashable, Community Roundtable o Social Media Today. Kelly té una entrada rellevant que hauria de seguir: 14 UK Information Professionals to Follow on Twitter?

What language was it in, I wondered? It seemed almost but not quite French, Spanish or Italian – but submitting the URL to the BabelFish translation service with each of these option provided no joy, although the Spanish to English translation did translate a couple of phrases.

But if BabelFish wasn’t of much help, how should I find out what the blog post was saying? The answer, of course, is to send a tweet to one’s followers. And so I asked:

Can someone tell me what language http://bit.ly/6jgzsI is in. And also is there a tool for guessing the language of a page.

And in a few minutes I was told that the post was written in Catalan: @virtualleader recognised the language as she has friends in Barcelona and @ijclark relied on his wife for the answer. I received about a dozen other responses, but most importantly one from @miquelduran, the author of the blog post who follows me on Twitter.

Google Translate Does The Job

As well as asking what language the post was in I also asked for suggestions on tools which can identify the language of Web pages. The responses were in agreement, Google Translate will not only translate pages from one language to another, if you don’t know what language the original page is written in, it will attempt to identify it.

And so using Google Translate I find that the blog post begins:

If I must be frank, I was somewhat surprised the evolution of Twitter as a tool for communication and social networking. In fact, Facebook has the same features have been changing to twitter. From my professional point of view, twitter can do three things now: to present an idea, concept or something (a conference, an event calendar … in short) (unidirectional), retrasmetre an event in which different people use same hashtag (semibidireccional), and generate conversation (usually public, but can also be closed) (bidirectional).

OK, I can understand that. Miquel Duran (a professor at the Universitat de Girona) was initially sceptical about the benefits of Twitter, but now recognises three areas in which it is useful. In his post Miquel goes on to cite a number of posts which illustrate Twitter’s benefits, including a number of my posts.

But it was Miquel’s concluding remarks which I found most interesting:

I must say, however, that there is something that concerns me. The Internet has the grace that is distributed. The email is not centralized, but Google via Gmail, so intense. There are blogs everywhere, and service blogs can install it on any server. However, no server own twitter. We are putting in the hands of a single vendor? (same, not just for Facebook.) So I saved all my information locally. I just desbobrir TweetTake, which saves the tweets, direct messages, and fans followed in a spreadsheet. We must be cautious and be wise.

A European Perspective On The Risks

These issues were at the heart of my paper on presented just before Christmas at the Cultural Heritage Online 2009 conference. And as I suggested in my post on “The Risks and Opportunities Framework” the US has taken a lead in making use of such third party services with organisations in the UK now making much greater use of such third party services without apparently being too concerned about “putting [the content] in the hands of a single vendor“.

So we are revisiting the issues concerning trust, ownership, sustainability and preservation – and I’ve learnt about a new tool, Tweetake, for backing up Twitter posts.

I’ve also found some further anecdotal evidence to back up the feeling I gained from the Cultural Heritage Online Conference that institutions in mainland Europe are more reluctant to make use of services in the Cloud than similar organisations in the US and UK.

For example the view that “Twitter, like blogging, needs an edge, a voice, a riskiness” expressed by Mike Ellis in his post “The person is the point” or Paul Walk’s post i which he points out that “Anything you quote from Twitter is always out of context” perhaps challenge Miquel’s conclusions that we need to be cautious in making use of services such as Twitter. Might not being cautious result in the benefits of Twitter’s spontaneity and informality being lost?

Extending My Community To Europe

So as a result of spotting a blog which linked to one of my posts and then using Google Translate to see what was being said I’ve started to extend my community beyond the English speaking world. And I’ve found that Google Translate can provide an comprehensible translation – and this was true of a number of other of Miquel’s posts which refer to my work.

In November 3009 Google announced “A new look for Google Translate” – it seems the service now “offers 51 languages, representing over 98% of Internet users today“.  And as the translation service is available from the Google Toolbar perhaps I should install this on my Web browser(s) and get into the habit of making use of it.

Hmm, I also wonder if I can get an RSS feed from Google Translate of Miquel’s posts which I can add to my RSS reader – so the posts of interest are delivered to me in a language I can understand rather than me having to find the posts and then involve a translate function.

Perhaps machine translation now does have a role to play. Invisible idiot? I think not!

5 Responses to “Extending Your Community – Through Machine Translation”

  1. @Brian: Perhaps machine translation now does have a role to play. Invisible idiot? I think not!

    I agree. Machine translation is a powerful tool. If the source text is optimised for machine translation, machine translation gives satisfactory translations (http://www.international-english.co.uk/mt-evaluation.html).

  2. mariekeguy said

    Hi Brian,

    Agreed, machine translation has moved on from the time when I wrote this article for the Cultivate project – The Soldiers are in the Coffee – An Introduction to Machine Translation”>The Soldiers are in the Coffee – An Introduction to Machine Translation. Google Translate does a pretty good job, though context is still the key to good translation. However as you show thorough your Twitter example it is ability to reach a global community that really helps us understand things written in another language.

    Marieke

  3. Edunomia said

    Llengua i social media [una evolució personal en l’ús de les llengües a la xarxa]…

    No fa pas gaire vaig escriure una entrada titulada “Llengua i twitter” on expressava la meva preocupació pel català a les xarxes socials (social media), especialment al twitter. Avui voldria exposar un petit canvi al meu món dels social med…

  4. […] two years ago in a post entitled Extending Your Community – Through Machine Translation I suggested that although in the past machine translation was felt to be of little use, […]

  5. […] Out Of Sight, Out of Mind? It was over ten years ago, when I was the project manager for the EU-funded Exploit Interactive ejournal that I first started to explore the potential of machine translat…  […]

Leave a comment