lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless" <>
Subject Re: MergeFactor and MaxBufferedDocs value should ...?
Date Fri, 23 Mar 2007 10:11:32 GMT

"SK R" <> wrote:
>     If I set MergeFactor = 100 and MaxBufferedDocs=250 , then first 100
> segments will be merged in RAMDir when 100 docs arrived. At the end of
> 350th
> doc added to writer , RAMDir have 2 merged segment files + 50 seperate
> segment files not merged together and these are flushed to FSDir.
>     If wrong, please correct me.
>     My doubt is whether we should set MergeFactor & MaxBufferedDocs in
> proportional ratio (i.e) MaxBufferedDocs = n*MergeFactor where n = 1,2
> ...
> to reduce indexing time and get greater performance or no need to worry
> about it's relation?

Actually, maxBufferedDocs is how many docs are held in RAM before
flushing to a single segment.  So with 250, after adding the 250th doc
the writer will write the first segment; after adding the 500th doc,
it writes the second segment, etc.

Then, mergeFactor says how many segments can be written before a merge
takes place.  A mergeFactor of 10 means after writing 10 such
segments from above, they will be merged into a single segment with
2500 docs.  After another 2500 docs you'll have 2 such segments.  Then
once you've added your 25000'th doc, all of the 2500 doc segments will
be merged into a single 25000 segment doc, etc.

To maximize indexing performance you really want maxBufferedDocs to be
as large as you can handle (the bigger you make it, the more RAM is
required by the writer).

I believe (not certain) larger values of mergeFactor will also improve
performance since it defers merging as long as possible.  However, the
larger you make this, the more segments are allowed to exist in your
index, and at some point you will hit file handle limits with your

I don't think these two parameters need to be proportional to one
another.  I don't think that will affect performance.

Another performance boost is to turn off compound file, but, this has
a severe cost of requiring far more file handles during searching.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message