lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: A question bout google search index?
Date Sun, 13 Jun 2010 22:51:35 GMT
http://research.google.com/pubs/DistributedSystemsandParallelComputing.html

On Thu, Jun 10, 2010 at 1:51 AM, Yuval Feinstein <yuvalf@answers.com> wrote:
> Most of the implementation of Google's search index is kept secret by Google.
> Based on publicly available information, the indexes are quite different -
> Google uses its BigTable and MapReduce technologies to efficiently distribute the index.
> There are similar efforts in the Lucene ecosystem - Solr Cloud is an advanced one,
> Which is currently in development.
> As Google's scoring algorithm uses hundreds of signals, I guess they store data pertinent
to these signals in the index.
> Lucene's index holds relatively few pieces of information about every document (posting
lists, term vectors,
> Sometimes norms and payloads).
> I believe there are other differences as well,
> But one could only guess what they are...
> Cheers,
> Yuval
>
>
> -----Original Message-----
> From: luocanrao [mailto:luocan19826164@sohu.com]
> Sent: Wednesday, June 09, 2010 5:18 PM
> To: java-user@lucene.apache.org
> Subject: A question bout google search index?
>
> A news bout google search index. Index system of Lucene can also support
> realtime search,
>
> Is there some difference between them?
>
>
>
> With Caffeine, we analyze the web in small portions and update our search
> index on a continuous basis, globally. As we find new pages, or new
> information on existing pages, we can add these straight to the index. That
> means you can find fresher information than ever before-no matter when or
> where it was published.
>
>
>
> Caffeine lets us index web pages on an enormous scale. In fact, every second
> Caffeine processes hundreds of thousands of pages in parallel. If this were
> a pile of paper it would grow three miles taller every second. Caffeine
> takes up nearly 100 million gigabytes of storage in one database and adds
> new information at a rate of hundreds of thousands of gigabytes per day. You
> would need 625,000 of the largest iPods to store that much information; if
> these were stacked end-to-end they would go for more than 40 miles
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>



-- 
Lance Norskog
goksron@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message