lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@lucene.com>
Subject Re: Biggest index size/document in Lucene
Date Tue, 04 Nov 2003 16:17:56 GMT
There was a bug (recently fixed) when creating indexes with over a 
couple hundred million documents.  So you should use 1.3 RC2, which has 
a fix for this bug.

The biggest indexes I've personally created have around 30M documents. 
I maintain these as a set of separately updated indexes, then merge them 
together into a single big index for deployment.  I find this easier 
than trying to maintain a single massive index.

My guess is that your search times won't be too fast, probably on the 
order of a few seconds (more than one, less than ten).  It will be disk 
bound.  You could improve performance by distributing search over 
multiple machines, each searching a smaller index, a subset of the 
entire data.

Doug

Victor Hadianto wrote:
> Hi all,
> 
> I'm interested to know have big of Lucene index/documents that you have
> experienced with? We are trying to index in the mark of 300 million text
> documents. Each document will be quite small around 10kb ish.
> 
> Any insight about the scalability of Lucene with this many documents?
> Creating the index and searching?
> 
> thanks,
> 
> /victor
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message