lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "J. Delgado" <jdelg...@lendingclub.com>
Subject Re: Lucene-based Distributed Index Leveraging Hadoop
Date Thu, 07 Feb 2008 00:43:06 GMT
I'm pretty sure that what you describe is the case, specially taking into
consideration that PageRank (what drives their search results) is a per
document value that is probably recomputed after some long time interval. I
did see a MapReduce algorithm to compute PageRank as well. However I do
think they must be distributing the query load across many many machines.

I also think that limiting flat results of the top 10 and then do paging is
optimized for performance. Yet another reason why Google has not implemented
facets browsing or real-time clustering around their result set.

J.D.

On Feb 6, 2008 4:22 PM, Andrzej Bialecki <ab@getopt.org> wrote:

> (trimming excessive cc-s)
>
> Ning Li wrote:
> > No. I'm curious too. :)
> >
> > On Feb 6, 2008 11:44 AM, J. Delgado <jdelgado@lendingclub.com> wrote:
> >
> >> I assume that Google also has distributed index over their
> >> GFS/MapReduce implementation. Any idea how they achieve this?
>
> I'm pretty sure that MapReduce/GFS/BigTable is used only for creating
> the index (as well as crawling, data mining, web graph analysis, static
> scoring etc). The overhead of MR jobs is just too high.
>
> Their impressive search response times are most likely the result of
> extensive caching of pre-computed partial hit lists for frequent terms
> and phrases - at least that's what I suspect after reading this paper
> (not by Google folks, but very enlightening):
> http://citeseer.ist.psu.edu/724464.html
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message