The arrow of time

Ivan Voras' blog

Google Instant search is not magic

In the last few days I have read several blogs, even technical blogs which should know better, treating Google's Instant Search like it's something otherworldly. Without actually analyzing how it's done, I think it's simply one of these things whose time has come. In other worlds, here's how I'd do it:

I've done extensive work with caching in web applications and basically, I think the solution is in the intersection of the following trivial facts:

  • RAM is cheap, especially if you buy in bulk. 1U servers can carry 192 GB today. [*]
  • Distributed algorithms like Map-Reduce are fairly well understood. Data partitioning kicks ass for these things.
  • There is actually a finite number of words with the "hardest hits" - those which are entered first. By "finite" I'm thinking of things like "the number of words in various languages" (for example: 500,000 for the English language according to Wikipedia). First-page results for all of these can be cached verbatim, even taking into question slight per-user customizations (at least binning user into large groups). The other cases (i.e. phrases, uncommon words) can be handled normally.
  • Broadband is really common with the group of people who will be impressed the most by this feature.
  • Bandwidth can be massivly better utilized by using HTTP compression - and storing the compressed results in the cache rather than compressing on the fly (actually it would be faster to decompress on the fly for those few clients which don't support compression).
  • To enhance all of the above, distributed data centers can be used (e.g. per-continent or per-country, which Google already uses).
  • It is really necessary to only be as good as human perception is, not in absolute terms. The "fades" in page loading offer precious milliseconds for the servers to respond in!

All this doesn't make it any less of an impressive achievement. Salute!

 

 

[*] Offtopic: did you know that the newest-generation Atom CPUs have the memory bandwidth comparable to last-generation Core 2 CPUs? For some perverse reason I'm fond of the idea of massively parallel applications running on a large number of power-saving servers...

#1 Re: Google Instant search is not magic

Added on 2010-09-10T07:55 by Yahoo was first!

See

http://uniquehazards.tumblr.com/post/1088334156/google-instant-is-an-example-of-how-yahoo-could-have


for a description of the same type of search, 5 years ago :)


The technology behind all this is likely to use a lot of the techniques your describe, but in the end, with such large volumes of data, it also requires a lot of those servers :) Especially if you combine this with the wish to keep fresh/"real-time" indexes.

#2 Re: Google Instant search is not magic

Added on 2010-09-10T19:09 by Mr*Gibson

I haven't been able to try it (Google Instant Search) yet, but from what I have seen it doesn't actually add alot to me. Sure, it looks cool and could be of some benefit.

But I would rather have a much enhanched "Advanced Search". Today it's close to useless for refining your search.

Post your comment here!

Your name:
Comment title:
Text:
Type "xxx" here:

Comments are subject to moderation and will be deleted if deemed inappropriate. All content is © Ivan Voras. Comments are owned by their authors... who agree to basically surrender all rights by publishing them here :)