lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin A. Burton" <>
Subject Re: Lucene optimization with one large index and numerous small indexes.
Date Mon, 29 Mar 2004 23:51:48 GMT
Doug Cutting wrote:

> How long is it taking to merge your 5GB index?  Do you have any stats 
> about disk utilization during merge (seeks/second, bytes 
> transferred/second)?  Did you try buffer sizes even larger than 1MB? 
> Are you writing to a different disk, as suggested?

I'll do some more testing tonight and get back to you

>> Note that right now this var is final and not public... so that will 
>> probably need to change.
> Perhaps.  I'm reticent to make it too easy to change this.  People 
> tend to randomly tweak every available knob and then report bugs, or, 
> if it doesn't crash, start recommending that everyone else tweak the 
> knob as they do.  There are lots of tradeoffs with buffer size, cases 
> that folks might not think of (like that a wildcard query creates a 
> buffer for every term that matches), etc.

Or you can do what I do and recompile ;) 

>> Does it make sense to also increase the OutputStream.BUFFER_SIZE?  
>> This would seem to make sense since an optimize is a large number of 
>> reads and writes.
> It might help a little if you're merging to the same disk as you're 
> reading from, but probably not a lot.  If you're merging to a 
> different disk then it shouldn't make much difference at all.
Right now we are merging to the same disk...  I'll perform some real 
benchmarks with this var too.  Long term we're going to migrate to using 
to SCSI disks per machine and then doing parallel queries across them 
with optimized indexes.

Also with modern disk controllers and filesystems I'm not sure how much 
difference this should make.  Both Reiser and XFS do a lot of internal 
buffering as does our disk controller.  I guess I'll find out...



Please reply using PGP.    
    NewsMonster -
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
       AIM/YIM - sfburtonator,  Web -
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
  IRC - #infoanarchy | #p2p-hackers | #newsmonster

View raw message