lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: maximum index size
Date Thu, 09 Sep 2004 03:50:06 GMT
Chris Fraschetti wrote:
> I've seen throughout the list mentions of millions of documents.. 8
> million, 20 million, etc etc.. but can lucene potentially handle
> billions of documents and still efficiently search through them?

Lucene can currently handle up to 2^31 documents in a single index.  To 
a large degree this is limited by Java ints and arrays (which are 
accessed by ints).  There are also a few places where the file format 
limits things to 2^32.

On typical PC hardware, 2-3 word searches of an index with 10M 
documents, each with around 10k of text, require around 1 second, 
including index i/o time.  Performance is more-or-less linear, so that a 
100M document index might require nearly 10 seconds per search.  Thus, 
as indexes grow folks tend to distribute searches in parallel to many 
smaller indexes.  That's what Nutch and Google 
(http://www.computer.org/micro/mi2003/m2022.pdf) do.

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message