lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Collins <>
Subject Re: Performance with multi index
Date Thu, 16 Jun 2005 15:59:56 GMT
I contest to the value of increasing the directly effects how much IO gets
performed in indexing.
Splitting it into multiple indices (if you want to pay the price of complexity), may well
increase your throughput.  Assuming you are not utilizing all of the resources the system
offers that is.  Say for example you have two indexing threads and one writer per thread.
 You can benifit in a few ways here.  Firstly indexing is a mixture of cpu and io bound (certainly
easier to observe that effect when you increase the minMergeDocs).  If you have an smp or
ht box then you potentially have the ability to use two "hardware threads" to concurrently
use.  Further you will have more chance for overlapping io.
A quick profile run may also give you clues on how inefficient your code is.

Volodymyr Bychkoviak <> wrote:

JM Tinghir wrote:

>>Could you qualify a bit more about what is slow? 
>Well, it just took 145 minutes to index 2670 files (450 MB) in one
>index (29 MB).
>It only took 33 minutes when I did it into ~10 indexes (global size of 32 MB).
I think it took so much time, because it's merged too ofter.
try to increase IndexWriter.mergeFactor (but be aware of 
TooManyOpenFiles Exception when setting too high) (default 10)
and try to increase IndexWriter.minMergeDocs (consume more ram, but 
works faster). (default 10)

playing a bit with this parameters you can speed up your indexing process.

>>Perhaps you need to optimize the index? 
>Perhaps, never tried it...
>To unsubscribe, e-mail:
>For additional commands, e-mail:

Volodymyr Bychkoviak

To unsubscribe, e-mail:
For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message