lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin A. Burton" <>
Subject Lucene optimization with one large index and numerous small indexes.
Date Mon, 29 Mar 2004 05:49:59 GMT
We're using lucene with one large target index which right now is 5G.  
Every night we take sub-indexes which are about 500M and merging them 
into this main index.  This merge (done via 
IndexWriter.addIndexes(Directory[]) is taking way too much time.

Looking at the stats for the box we're essentially blocked on reads.  
The disk is blocked on read IO and CPU is at 5%.  If I'm right I think 
this could be minimized by continually picking the two smaller indexes, 
merging them, then picking the next two smallest, merging them, and then 
keep doing this until we're down to one index.

Does this sound about right?


Please reply using PGP.    
    NewsMonster -
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
       AIM/YIM - sfburtonator,  Web -
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
  IRC - #infoanarchy | #p2p-hackers | #newsmonster

View raw message