UK Web Focus

Innovation and best practices for the Web

Should We “Leave Search To Google?”

Posted by Brian Kelly (UK Web Focus) on 21 April 2008

When I chaired the session on Search at the Museums and the Web 2008 conference the discussion, as I described in a recent post, turned to lightweight approaches to federated searching. During the session I received a Twitter comment on my feedback channel (intermingled with the football scores!) asking “is it more useful to develop compelling browse interfaces & leave search to Google?” The response at the time seemed to be that although Google might have a role to play in the future, its role at present is limited (in a museums’ context) due to the complexities of typical collections management Web interfaces: the valuable data is part of the ‘deep Web’ which search engines such as Google find difficult to index.

But just a few days ago, via a comment made by Nate Solas on his blog post about the Search session, I discovered that Google have announced their intention to index the deep Web:

This experiment is part of Google’s broader effort to increase its coverage of the web. In fact, HTML forms have long been thought to be the gateway to large volumes of data beyond the normal scope of search engines. The terms Deep Web, Hidden Web, or Invisible Web have been used collectively to refer to such content that has so far been invisible to search engine users. By crawling using HTML forms (and abiding by robots.txt), we are able to lead search engine users to documents that would otherwise not be easily found in search engines, and provide webmasters and users alike with a better and more comprehensive search experience.

Mia Ridge has commented on the implications of this announcement:

You’re probably already well indexed if you have a browsable interface that leads to every single one of your collection records and images and whatever; but if you’ve got any content that was hidden behind a search form (and I know we have some in older sites), this could give it much greater visibility.

In light of Google’s announcement it is timely, I would think, to revisit the question “It is it more useful to develop compelling browse interfaces & leave search to Google?” Imagine the quality of services we could provide if we redirect resources from replicating search algorithms which have already been developed (“standing on the shoulders of giants”).

And let’s remember (a) the evidence which suggests that users prefer simple search interfaces and (b) the costs of attempting to compete with Google in the search area – let’s not forget that, despite their riches, Microsoft haven’t been able to compete successfully. Is it likely that search technologies developed by tax-payers’ money will succeed where Microsoft have failed?

PS I should probably add that I’m not the first to suggest this idea. The OpenDOAR team, in particular have deployed a search interface using Google across institutional repository services. Many congratulations to the team at the University of Nottingham for evaluating this lightweight approach.

About these ads

27 Responses to “Should We “Leave Search To Google?””

  1. seb chan said

    Brian

    My concern is that if we do leave search to Google we lose the opportunity to leverage the data that search generates (that opportunity is ceded to Google). And, as I’ve repeatedly said, there is a lot to gain from search data – eg http://www.archimuse.com/mw2007/papers/chan/chan.html). Lorcan Dempsey has also written a lot about this over at OCLC. Of course, often we are in no position to really utilise the search data that we gather.

    Seb

  2. Code Gorilla said

    Why not leave the searching to google?

    There is an argument that a locally hosted solution is never at the mercy of an external hosting body upping their charges (or changing their Terms & Conditions) on you. If one uses Google, then Google could suddenly apply a fee for their service, you either need to subscribe, or lose the service.
    With a locally hosted solution, if there is a price-hike for the new version then don’t buy it & stick with the old version.

  3. Everyone should ask this before they go and implement their own search mechanism – what are the users going to gain from your search tool that can’t be accomplished with google? Are there better uses for our time? In my experience google’s search is more effective than 90% of site search tools.

    There are lots of reasons why it makes sense to set up a well-designed search for a specific type of data such as collections. But our lives are full of good uses for our skills and replicating the work of google for a general site search doesn’t rank high on my list.

    As to Seb Chan’s point. Although incomplete, you can get a sense of the keyword patterns from an analysis of referrers.

  4. Well at Museum of London are amongst many organisations that take both approaches – google-driven site search, and home-brew collections search. As my colleage Mia said, there are odd ends of our content that can’t be browsed to and it’s nice to think that Google might start to cover them, too, but that doesn’t yet make it an alternative (though there are certainly times when it could be).
    What home-brewed search is good for, of course, is the more structured search. Doubtless as Google ramps up it semantic power (and I don’t doubt this is part of the plan) it will become possible to ask it more intelligent questions. Yahoo!, of course, is now indexing certain structured/semanticised HTML (including some microformats) and presumably their search interface and API will soon reflect this. Then we’ll perhaps see progressively fewer reasons to roll your own. But given the idiosyncratic data structures we sometimes want to work with this seems likely to be a way off.
    I do, though, very much agree that it’s frankly silly to do this stuff again and again. This is why I’m excited by the possibilities that seem to be emerging from the EDL/Europeana project. I’ve ended up being involved in it because it struck me that it would be worse than useless if it didn’t enable us (as content providers) and the public in general access to an API; but that conversely it could be extremely useful if such an API did exist, for exactly the reasons you mention: basically, someone else doing the heavy lifting in search (and translation!). I reckon that aside from anything else, if EDL offers this then it can be sold to content providers, and that’s the case I’ve been making in the “Users and usability” working party.
    Turned out I was pushing on an open door so I’m quite optimistic that it will be a useful step. Time will tell.

  5. xcia0069 said

    Hi Brian

    Two questions to consider

    a) Does all the evidence show that users prefer simple mechanisms for searching? I would reckon that most users prefer Google. However, when expert users need to undertake specialist searching the Google paradigm might not be the best

    b) Does Google do good searching of multi-media (i.e. non textual) material?

  6. Mike said

    I’d say – mostly, yes.

    Seb’s point about data around search is a really good one, and a down side to easy options like Google Coop (although the plus side is it’s REALLY easy, and you still get some stats..)

    BUT – if you can afford a Google Mini (around $2k) or a GSA (around $30k) then you get a bunch of stats out of the box. There’s a reality check here, though: although the “what people searched for, what they found” data is limited, in reality it’s all you’re gonna have time to look at, in my experience.

    It’s little known that the GSA can index databases, feeds, metadata – all worth bearing in mind. Also consider the Google Analytics integration with site search: http://analytics.blogspot.com/2007/11/site-search-now-available.html – I haven’t tried it myself but it’s probably pretty good.

    On the questions raised by @Xcia0069 – 1) yes, absolutely – particularly if you’re talking about a “normal” (non-academic) audience. Provide a link to “advanced search” but make the general search “just a box” – it’s what most people use every day with Google so it’s what they expect. They *also* expect the search to be as good as Google (another reason for *using* Google to do search!). re 2) No – text only, but apart from sites like Riya and Autonomy (OUCH! $$$) I don’t know of any others that do…

    Mike

  7. Thanks for the various comments. Note that Jennifer Trant has posted links to research reports on her blog which discuss users’ preferences for search. Worth reading.

  8. Shelley said

    I was pretty frustrated by all the MW sessions on cross-institution searching. As awesome as it would be: a) not every museum can afford to have a coding staff and b) when we do it is hard enough with all the competing priorities to put our collections on our own websites and c) the professional community can’t manage to agree on standards. I’d love to have all the time and coders in the world, but we don’t and the priorities are going to differ from institution to institution. I’m sure this won’t be popular – but I was frustrated. What Seb and CodeGorilla say makes sense, but given my a.b.c….

    Wouldn’t it be great if we all concentrated on getting our collections onto our various individual sites in a clear and indexable manner, then approached Google to create GoogleART? Just like GoogleScholar only for the collections that happen to be on the web already. I bet if we banded together and approached them, it might put enough weight behind us to have a little say in the metrics side (it will never be all that Seb wants :) but is that tradeoff worth the added exposure Google might bring? Why can’t we work with them to make their searching more accurate for our specific data sets, complete with some kind of admin interface?

  9. seb chan said

    Google Art – bad idea.

    there’s been talk of this before and the general feeling, when people start looking really closely at the fine print, is “do we really want to cede our assets to Google?”.

    cross-search is *not* difficult at a technical level – the real issues are structural and organisational and Google isn;’t going to help with those. the core issues are Copyright (including moral rights) and ROI. these vary significantly from country to country and many governments (outside the USA) are able to be lobbied to support mass digitisation.

    But . . . on to Google.

    the easiest thing Google could do to make cross-search work right away is just to open up their Google CSE with an API (which you used to actually be able to do) and get search results out as XML . . . . . the problem is is that that breaks their business model which is to display adverts alongside the search results. the only way you can get this is if you buy a Google Search Applicance (or Google Mini) as Mike E points out.

    now there is no reason why for NON-PROFITS they couldn’t offer an XML result feed . . . . because they let non-profits get away with not showing adverts in any case!

    Google is a quick answer but not necessarily the best long term one.

  10. seb chan said

    Oh and of course if we can do this ourselves we can actually add what museums do best – context and experience.

  11. hi Seb – Not sure I agree with you. And I think the comment “do we really want to cede our assets to Google?” is misleading. Have we ceded our Web content to Google?

    I agree with you that the main questions are structural and organisational. But once we’ve addressed those we should find that our assets are available for ANYONE to index – and that will include Google. The question is then, which tool will most users use? The specialist may use a specialist search tool, but the mass market will use the popular search enegine – which is currently Google.

  12. Hi Seb – my comment was to comment no. 9. I’d agree with your last comment – in the mixed environment I’m suggesting, museums can add value along the lines you suggest. But I think it would be a mistake to hide resources from commercial services if those commercial services will provide value to end users.

  13. Shelley said

    I really agree with Brian here. “cede our assests” – what? allow Google to do what they do already but under an umbrella that might make searching more efficient? The whole XML thing is tough for smaller museums – they don’t necessarily have the staff to make that happen, but they might already have assets on the web that could be indexed within something like a coop search.

    I’m not saying that Google is the ONLY answer, but it’s a thought that might put our collections where the people are – allowing them to find things on their own terms, rather than making them come directly to us or some co-op we create. That’s better for everyone. There’s really no reason at all why we can’t do both -

  14. seb chan said

    Shelley . . . isn’t that just SEO then? Google can already get to our collections in a basic form if we’ve –

    a) catalogued
    b) digitised
    c) use a decent collection management system
    d) have the motivation to put the collection online

    Problem is most small museums have troubles with A and B. Especially those that are volunteer run.

    Google Art implies something much grander.

  15. “Just SEO”? Organisations pay a lot to have their Web sites made discoverable. Search engine optimisation is of relevance to ouyr communities – and having static URIs will help. So let’s continue in that direction. Agreed?

    And our catalogues of catalogues and our richly structured metadata should also be discoverable. Perhaps that’s where Google Art could fit in.

    I believe that open content should by like open source – non-discriminatory. If Google, Yahoo or anyone else can build services based on our content, then that can encourage innovation, development and creativity.

  16. Shelley said

    I think right now, Google searching isn’t that great for collections material. Search for picasso and you get a whole lot of things sort of all over the map. I would hope GoogleArt would be more like this but allow slightly better options for searching – for instance, I want see work by picasso, not records where picasso is listed in descriptive text. I want to see work from collections in my geographic area, etc. So, I guess I’m not expecting “grand” just decent? Hey, I got this idea from you!

  17. Seb Chan said

    Brian, Shelley

    We *can* already build a museum collection custom search with existing Google – Frankie, Mike, myself are at the very least those who’ve tried it.

    That will deliver (you are right I did blog about this ages ago) an average result for those institutions who are –

    a) already have digitised collections
    b) present them on their websites

    For the rest of them who haven’t the problems are institutional and organisational.

    If SEO is a problem then someone can pay me to travel around for a few years . . . heh heh.

    We’ve probably come around in circles!

    I’m not disagreeeing that a Google CSE approach is sensible – yes, we should do it – and we can do it already (without asking Google).

    Indeed – how abaout I grant you all contributor access to this CSE I set up ages ago? (just go and request permissions!)

    http://tinyurl.com/5cysao

    What I disagree with is any notion that Google holds anything more than what we already hold or make available ourselves. If GoogleArt was to be like GoogleBooks then I have many reservations.

    Seb

  18. Nate Solas said

    Wow, I drop off this thread for a few days and it gets really interesting!

    I’m intrigued by the notion of GoogleART, for sure. Perhaps, coupled with Google’s new attempts to discover and index the deeper web, they can bring more collections to the forefront without much (any?) extra work by the institutions.

    However, to do what Shelley’s describing in #16 – know the difference in the word Picasso as a “creator” vs. just a mention in the text, or geographic information – is going to rely on the very things we’re arguing about in the first place: that is, metadata and semantic search vs. plain Google keyword search.

    Museums surely have this metadata and could “tell Google”, but that means we’re going to have to provide it in some hopefully-standardized machine-readable format. Alternately, Google could try to build in some clever parsing code to try to extract meaning from our collection pages, but that seems like an incredible challenge across institutions.

    So I’m interested and want to explore this some more, but my initial reaction is that we aren’t going to do better than a CSE unless we can provide metadata with meaning — and that’s going to take developers and time just like the cross-site collection searches we were arguing about at MW2008.

    ( I need to write a post like Seb’s OpenSearch – It isn’t all that hard, only I’d call it CDWAlite – It isn’t all that hard! )

    - Nate

  19. seb chan said

    Nate

    (slightly off topic – but it proves a point that the challenges run much deeper than search)

    Use the CSE and do a search for Ada Web (which is in your collection).

    http://tinyurl.com/5cysao

    If you changed the page titles for your collection records to actually start with the object name we’d already be much further down the track of getting better results . . . . AND improving SEO at the Walker Art Center.

    Seb

  20. Nate Solas said

    @Seb – I know, I know… :) Luckily I can blow it off by saying those pages pre-date my time at the Walker, but that doesn’t mean it shouldn’t be fixed. And the frameset! Ahh!

    The good news is that’s finally on the horizon of projects I can tackle. Soon, soon.

    (although, you’ve just triggered a thought – there may be a quick fix for the title issue!)

  21. Shelley said

    seb on #17 – agreed – not googlebooks, they don’t house anything they just point. nate on #18 – agreed, but i think one of the benefits might be that participants could a) do that alteration (microformats?) if they had time/staff or b) still be included if they didn’t. One of the things I’d like to see an engineer at Google help with is provide both the general search and the more advanced search that would read the microformats (or other) – best of both worlds.

    While I think the co-op searching we’ve already done is good, it doesn’t help us with the advanced search that would make this really useful. In addition, those co-ops get lost in the other co-op searches – it would be nice to be featured so the community using google knows this is a resource. Our co-op searches right now are tough because we are still expecting the community to come to us in a way.

  22. [...] there have been a lot of discussions on the merits of creating custom search engines vs using Google. I personally don’t have any [...]

  23. Nate Solas said

    @Seb #19 – just wanted to follow up and say not only did I find a quick fix to the title issue, but Google’s been nice enough to reindex those pages and things are looking much better for our placement in results… Yay for quick and easy fixes!

  24. Mia said

    I think the benefit is in providing structured data that can be understood semantically (Picasso is a person, Guernica is the title of a painting as well as a place but is more likely to mean the painting if the other search term is not ‘Spain’) and searched intelligently rather than a silo of museum search results.

    I suspect the average search engine user is ‘museum agnostic’ – they don’t care where the object is held, and they’re probably not even thinking of it as a ‘museum object’ – it’s just ‘that picture with the goldy bits that everyone has a print of in college’ or they’ve read ‘Ode on a Grecian Urn’ and want to know what a Grecian urn looks like. Our idea of ‘collection data’ is a lot more specific than their idea of ‘information’, and we should get our data out to general users rather than hoping they’ll come visit our silo.

    Re: GoogleArt – where would that leave history or science museums?

  25. Mia said

    It looks like some text was converted into a smiley! That was meant to be “is not ‘Spain’ ) “

  26. Hi Mia – I’ve also noticed WordPress unexpectedly turning punctuation into smileys. And I can’t see the interface for editing your post – which was easily found before WordPress updated their admin interface.

  27. Shelley said

    Hi Mia, GoogleArt is just a catchy name, we can always come up with something better to include science and history peeps :) Really, GoogleArt could just fold right into GoogleScholar as it stands now. I just noticed, Seb posted this which kind of begs to revisit this question. Why does our own internal searching have to be the end all be all? Why do our own fed search ideas have to be the only thing we do? Sure, SEO will help us with search results generally and should be done – no question. But I think partnering with Google in some way would give our resources much more visibility as a whole than any fed search we built on our own. I have a feeling that it’s a combination of things (internal search, SEO, fed searches of various kinds, possible GoogleArt type thing, other creative ways to get collection data out into the world) that would bring the most results and that’s better for the data all around.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: