lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@lucene.com>
Subject Re: Lucene Benchmarks and Information
Date Fri, 20 Dec 2002 18:45:03 GMT
Armbrust, Daniel C. wrote:
> While I was trying to build this index, the biggest limitation of
> Lucene that I ran into was optimization.  Optimization kills the
> indexers performance when you get between 3-5 million documents in an
> index.  On my Windows XP box, I had to reoptimize every 100,000
> documents to keep from running out of file handles.

What is the file handle limit on XP?

When batch indexing, optimizing before the end slows things down, and 
should not be required.

Are you otherwise opening index readers in the same process?  Index 
readers use a lot more file handles than the index writer, since they 
must keep all files in all segments open.  For large indexes it's best 
to do your indexing in a separate process which never opens an IndexReader.

The max a reader will keep open is:

   mergeFactor * log_base_mergeFactor(N) * files_per_segment

With mergeFactor=10 (the default) and 1 million documents, and 10 files 
per segment, a reader on a never-optimized index should at most require 
600 open files, and typically half that.

A writer will open:

   (1 + mergeFactor) * files_per_segment

With mergeFactor=10 (the default) and 1 million documents, a writer on a 
never-optimized index would require 110 open files.

I just built a 3M document index on Linux in five hours, with no 
intermediate optimizations.  I set the mergeFactor to 50.  This required 
around 500 file handles, well beneath the 1024 limit.

Doug


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message