lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Lucene-based Distributed Index Leveraging Hadoop
Date Thu, 07 Feb 2008 00:22:23 GMT
(trimming excessive cc-s)

Ning Li wrote:
> No. I'm curious too. :)
> 
> On Feb 6, 2008 11:44 AM, J. Delgado <jdelgado@lendingclub.com> wrote:
> 
>> I assume that Google also has distributed index over their
>> GFS/MapReduce implementation. Any idea how they achieve this?

I'm pretty sure that MapReduce/GFS/BigTable is used only for creating 
the index (as well as crawling, data mining, web graph analysis, static 
scoring etc). The overhead of MR jobs is just too high.

Their impressive search response times are most likely the result of 
extensive caching of pre-computed partial hit lists for frequent terms 
and phrases - at least that's what I suspect after reading this paper 
(not by Google folks, but very enlightening): 
http://citeseer.ist.psu.edu/724464.html

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message