lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: Biggest index size/document in Lucene
Date Tue, 04 Nov 2003 16:17:56 GMT
There was a bug (recently fixed) when creating indexes with over a 
couple hundred million documents.  So you should use 1.3 RC2, which has 
a fix for this bug.

The biggest indexes I've personally created have around 30M documents. 
I maintain these as a set of separately updated indexes, then merge them 
together into a single big index for deployment.  I find this easier 
than trying to maintain a single massive index.

My guess is that your search times won't be too fast, probably on the 
order of a few seconds (more than one, less than ten).  It will be disk 
bound.  You could improve performance by distributing search over 
multiple machines, each searching a smaller index, a subset of the 
entire data.


Victor Hadianto wrote:
> Hi all,
> I'm interested to know have big of Lucene index/documents that you have
> experienced with? We are trying to index in the mark of 300 million text
> documents. Each document will be quite small around 10kb ish.
> Any insight about the scalability of Lucene with this many documents?
> Creating the index and searching?
> thanks,
> /victor
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message