lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Esmond Pitt" <>
Subject Re: Lucene optimization with one large index and numerous small indexes.
Date Tue, 30 Mar 2004 01:21:04 GMT
Don't want to start a buffer size war, but these have always seemed too
small to me. I'd recommend upping both InputStream and OutputStream buffer
sizes to at least 4k, as this is the cluster size on most disks these days,
and also a common VM page size. Reading and writing in smaller quantities
than these is definitely suboptimal.

Esmond Pitt

----- Original Message ----- 
From: "Doug Cutting" <>
To: "Lucene Users List" <>
Sent: Tuesday, March 30, 2004 8:16 AM
Subject: Re: Lucene optimization with one large index and numerous small

> Kevin A. Burton wrote:
> >> One way to force larger read-aheads might be to pump up Lucene's input
> >> buffer size.  As an experiment, try increasing InputStream.BUFFER_SIZE
> >> to 1024*1024 or larger.  You'll want to do this just for the merge
> >> process and not for searching and indexing.  That should help you
> >> spend more time doing transfers with less wasted on seeks.  If that
> >> helps, then perhaps we ought to make this settable via system property
> >> or somesuch.
> >>
> > Good suggestion... seems about 10% -> 15% faster in a few strawman
> > benchmarks I ran.
> How long is it taking to merge your 5GB index?  Do you have any stats
> about disk utilization during merge (seeks/second, bytes
> transferred/second)?  Did you try buffer sizes even larger than 1MB?
> Are you writing to a different disk, as suggested?
> > Note that right now this var is final and not public... so that will
> > probably need to change.
> Perhaps.  I'm reticent to make it too easy to change this.  People tend
> to randomly tweak every available knob and then report bugs, or, if it
> doesn't crash, start recommending that everyone else tweak the knob as
> they do.  There are lots of tradeoffs with buffer size, cases that folks
> might not think of (like that a wildcard query creates a buffer for
> every term that matches), etc.
> > Does it make sense to also increase the
> > OutputStream.BUFFER_SIZE?  This would seem to make sense since an
> > optimize is a large number of reads and writes.
> It might help a little if you're merging to the same disk as you're
> reading from, but probably not a lot.  If you're merging to a different
> disk then it shouldn't make much difference at all.
> Doug
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message