lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Armbrust, Daniel C." <>
Subject RE: Lucene Benchmarks and Information
Date Fri, 20 Dec 2002 19:09:48 GMT
Not sure what the file handle limit is on XP, but it seemed to be fairly generous.  
I wasn't opening any other files, or index readers while building the index.  However, I didn't
realize the connection between the merge factor and the number of files held open.  In some
ad hoc tests, on these docs that I was indexing I got the best speed out of the indexer (not
taking optimizations into account) with a merge factor of 500 (which is what I used).  I'll
have to try cranking down the merge factor, and see how many I can index without calling optimize.

Thanks for the info!


-----Original Message-----
From: Doug Cutting [] 
Sent: Friday, December 20, 2002 12:45 PM
To: Lucene Users List
Subject: Re: Lucene Benchmarks and Information

Armbrust, Daniel C. wrote:
> While I was trying to build this index, the biggest limitation of
> Lucene that I ran into was optimization.  Optimization kills the
> indexers performance when you get between 3-5 million documents in an
> index.  On my Windows XP box, I had to reoptimize every 100,000
> documents to keep from running out of file handles.

What is the file handle limit on XP?

When batch indexing, optimizing before the end slows things down, and 
should not be required.

Are you otherwise opening index readers in the same process?  Index 
readers use a lot more file handles than the index writer, since they 
must keep all files in all segments open.  For large indexes it's best 
to do your indexing in a separate process which never opens an IndexReader.

The max a reader will keep open is:

   mergeFactor * log_base_mergeFactor(N) * files_per_segment

With mergeFactor=10 (the default) and 1 million documents, and 10 files 
per segment, a reader on a never-optimized index should at most require 
600 open files, and typically half that.

A writer will open:

   (1 + mergeFactor) * files_per_segment

With mergeFactor=10 (the default) and 1 million documents, a writer on a 
never-optimized index would require 110 open files.

I just built a 3M document index on Linux in five hours, with no 
intermediate optimizations.  I set the mergeFactor to 50.  This required 
around 500 file handles, well beneath the 1024 limit.


To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message