lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: over 300 GB to index: feasability and performance issue
Date Mon, 26 Jul 2004 18:20:48 GMT
Vincent Le Maout wrote:
> I have to index a huge, huge amount of data: about 10 million documents
> making up about 300 GB. Is there any technical limitation in Lucene that
> could prevent me from processing such amount (I mean, of course, apart
> from the external limits induce by the hardware: RAM, disks, the system,
> whatever) ?

Lucene is in theory able to support up to 2B documents in a single 
index.  Folks have sucessfully built indexes with several hundred 
million documents.  10 million should not be a problem.

> If possible, does anyone have an idea of the amount of resource
> needed: RAM, CPU time, size of indexes, access time on such a collection ?
> if not, is it possible to extrapolate an estimation from previous 
> benchmarks ?

For simple 2-3 term queries, with average sized documents (~10k of text) 
you should get decent performance (1 second / query) on a 10M document 
index.  An index typically requires around 35% of the plain text size.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message