lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: Lucene optimization with one large index and numerous small indexes.
Date Mon, 29 Mar 2004 22:16:05 GMT
Kevin A. Burton wrote:
>> One way to force larger read-aheads might be to pump up Lucene's input 
>> buffer size.  As an experiment, try increasing InputStream.BUFFER_SIZE 
>> to 1024*1024 or larger.  You'll want to do this just for the merge 
>> process and not for searching and indexing.  That should help you 
>> spend more time doing transfers with less wasted on seeks.  If that 
>> helps, then perhaps we ought to make this settable via system property 
>> or somesuch.
> Good suggestion... seems about 10% -> 15% faster in a few strawman 
> benchmarks I ran.  

How long is it taking to merge your 5GB index?  Do you have any stats 
about disk utilization during merge (seeks/second, bytes 
transferred/second)?  Did you try buffer sizes even larger than 1MB? 
Are you writing to a different disk, as suggested?

> Note that right now this var is final and not public... so that will 
> probably need to change.

Perhaps.  I'm reticent to make it too easy to change this.  People tend 
to randomly tweak every available knob and then report bugs, or, if it 
doesn't crash, start recommending that everyone else tweak the knob as 
they do.  There are lots of tradeoffs with buffer size, cases that folks 
might not think of (like that a wildcard query creates a buffer for 
every term that matches), etc.

> Does it make sense to also increase the 
> OutputStream.BUFFER_SIZE?  This would seem to make sense since an 
> optimize is a large number of reads and writes.

It might help a little if you're merging to the same disk as you're 
reading from, but probably not a lot.  If you're merging to a different 
disk then it shouldn't make much difference at all.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message