Extending Your Community – Through Machine Translation
Posted by Brian Kelly on 29 December 2009
Out Of Sight, Out of Mind?
It was over ten years ago, when I was the project manager for the EU-funded Exploit Interactive ejournal that I first started to explore the potential of machine translation. Could we, I wondered, make use of Web-base language translation services to translate articles published in English into other languages?
“Nonsense!” was a response I encountered. “Computer translations won’t work” and I was told the story of how a computer translated the phrase “out of sight, out of mind” to one language and when it was translated back into English it came out as “invisible idiot”.
Now although I can appreciate the difficulties of translating idioms, my interest was not in the translation of full-text articles but automated translation of the summary of the articles. However European colleagues were sceptical of such automated translations and so, after some experiments with the BabelFish translation service, we did not pursue this.
What Are They Saying About Me In Catalonia?
Since my involvement as project manager for Exploit Interactive and its successor Cultivate Interactive, I have given little thought to language issues. And I keep up-to-date with developments by reading English-language resources, these days typically UK, US, Canadian and Australian blog posts and tweets, with the occasional English-language peer-reviewed article.
From time to time, however, I notice links to my blog from non-English posts. And just before Christmas I noticed an incoming link from a blog post entitled “Tots som tweets (a la universitat)“.
That will be a post citing one of my articles about Twitter, I thought, and visited the post to see what it said:
Un article clau, que no deixa indiferent, és el de Brian Kelly al seu blog UK Web Focus: “I Want To Use Twitter For My Conference” on exposa bones pràctiques en l’ús de twitter per organitzar un congrès o conferència. Les entrades de Kelly són molt rellevants i es tracta d’un blog que trobo de seguiment obligat, igual que Mashable, Community Roundtable o Social Media Today. Kelly té una entrada rellevant que hauria de seguir: 14 UK Information Professionals to Follow on Twitter?
What language was it in, I wondered? It seemed almost but not quite French, Spanish or Italian – but submitting the URL to the BabelFish translation service with each of these option provided no joy, although the Spanish to English translation did translate a couple of phrases.
But if BabelFish wasn’t of much help, how should I find out what the blog post was saying? The answer, of course, is to send a tweet to one’s followers. And so I asked:
Can someone tell me what language http://bit.ly/6jgzsI is in. And also is there a tool for guessing the language of a page.
And in a few minutes I was told that the post was written in Catalan: @virtualleader recognised the language as she has friends in Barcelona and @ijclark relied on his wife for the answer. I received about a dozen other responses, but most importantly one from @miquelduran, the author of the blog post who follows me on Twitter.
Google Translate Does The Job
As well as asking what language the post was in I also asked for suggestions on tools which can identify the language of Web pages. The responses were in agreement, Google Translate will not only translate pages from one language to another, if you don’t know what language the original page is written in, it will attempt to identify it.
And so using Google Translate I find that the blog post begins:
If I must be frank, I was somewhat surprised the evolution of Twitter as a tool for communication and social networking. In fact, Facebook has the same features have been changing to twitter. From my professional point of view, twitter can do three things now: to present an idea, concept or something (a conference, an event calendar … in short) (unidirectional), retrasmetre an event in which different people use same hashtag (semibidireccional), and generate conversation (usually public, but can also be closed) (bidirectional).
OK, I can understand that. Miquel Duran (a professor at the Universitat de Girona) was initially sceptical about the benefits of Twitter, but now recognises three areas in which it is useful. In his post Miquel goes on to cite a number of posts which illustrate Twitter’s benefits, including a number of my posts.
But it was Miquel’s concluding remarks which I found most interesting:
I must say, however, that there is something that concerns me. The Internet has the grace that is distributed. The email is not centralized, but Google via Gmail, so intense. There are blogs everywhere, and service blogs can install it on any server. However, no server own twitter. We are putting in the hands of a single vendor? (same, not just for Facebook.) So I saved all my information locally. I just desbobrir TweetTake, which saves the tweets, direct messages, and fans followed in a spreadsheet. We must be cautious and be wise.
A European Perspective On The Risks
These issues were at the heart of my paper on presented just before Christmas at the Cultural Heritage Online 2009 conference. And as I suggested in my post on “The Risks and Opportunities Framework” the US has taken a lead in making use of such third party services with organisations in the UK now making much greater use of such third party services without apparently being too concerned about “putting [the content] in the hands of a single vendor“.
So we are revisiting the issues concerning trust, ownership, sustainability and preservation – and I’ve learnt about a new tool, Tweetake, for backing up Twitter posts.
I’ve also found some further anecdotal evidence to back up the feeling I gained from the Cultural Heritage Online Conference that institutions in mainland Europe are more reluctant to make use of services in the Cloud than similar organisations in the US and UK.
For example the view that “Twitter, like blogging, needs an edge, a voice, a riskiness” expressed by Mike Ellis in his post “The person is the point” or Paul Walk’s post i which he points out that “Anything you quote from Twitter is always out of context” perhaps challenge Miquel’s conclusions that we need to be cautious in making use of services such as Twitter. Might not being cautious result in the benefits of Twitter’s spontaneity and informality being lost?
Extending My Community To Europe
So as a result of spotting a blog which linked to one of my posts and then using Google Translate to see what was being said I’ve started to extend my community beyond the English speaking world. And I’ve found that Google Translate can provide an comprehensible translation – and this was true of a number of other of Miquel’s posts which refer to my work.
In November 3009 Google announced “A new look for Google Translate” – it seems the service now “offers 51 languages, representing over 98% of Internet users today“. And as the translation service is available from the Google Toolbar perhaps I should install this on my Web browser(s) and get into the habit of making use of it.
Hmm, I also wonder if I can get an RSS feed from Google Translate of Miquel’s posts which I can add to my RSS reader – so the posts of interest are delivered to me in a language I can understand rather than me having to find the posts and then involve a translate function.
Perhaps machine translation now does have a role to play. Invisible idiot? I think not!
This entry was posted on 29 December 2009 at 10:06 am and is filed under Social Networking. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.