lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <>
Subject Re: A question bout google search index?
Date Sun, 13 Jun 2010 22:51:35 GMT

On Thu, Jun 10, 2010 at 1:51 AM, Yuval Feinstein <> wrote:
> Most of the implementation of Google's search index is kept secret by Google.
> Based on publicly available information, the indexes are quite different -
> Google uses its BigTable and MapReduce technologies to efficiently distribute the index.
> There are similar efforts in the Lucene ecosystem - Solr Cloud is an advanced one,
> Which is currently in development.
> As Google's scoring algorithm uses hundreds of signals, I guess they store data pertinent
to these signals in the index.
> Lucene's index holds relatively few pieces of information about every document (posting
lists, term vectors,
> Sometimes norms and payloads).
> I believe there are other differences as well,
> But one could only guess what they are...
> Cheers,
> Yuval
> -----Original Message-----
> From: luocanrao []
> Sent: Wednesday, June 09, 2010 5:18 PM
> To:
> Subject: A question bout google search index?
> A news bout google search index. Index system of Lucene can also support
> realtime search,
> Is there some difference between them?
> With Caffeine, we analyze the web in small portions and update our search
> index on a continuous basis, globally. As we find new pages, or new
> information on existing pages, we can add these straight to the index. That
> means you can find fresher information than ever before-no matter when or
> where it was published.
> Caffeine lets us index web pages on an enormous scale. In fact, every second
> Caffeine processes hundreds of thousands of pages in parallel. If this were
> a pile of paper it would grow three miles taller every second. Caffeine
> takes up nearly 100 million gigabytes of storage in one database and adds
> new information at a rate of hundreds of thousands of gigabytes per day. You
> would need 625,000 of the largest iPods to store that much information; if
> these were stacked end-to-end they would go for more than 40 miles
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Lance Norskog

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message