lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless" <luc...@mikemccandless.com>
Subject Re: MergeFactor and MaxBufferedDocs value should ...?
Date Fri, 23 Mar 2007 14:28:20 GMT

"Erick Erickson" <erickerickson@gmail.com> wrote:
> I haven't used it yet, but I've seen several references to
> IndexWriter.ramSizeInBytes() and using it to control when the writer
> flushes the RAM. This seems like a more deterministic way of
> making things efficient than trying various combinations of
> maxBufferedDocs , MergeFactor, etc, all of which are guesses
> at best.

I agree this is the most efficient way to flush.  The one caveat is
this Jira issue:

  http://issues.apache.org/jira/browse/LUCENE-845

which can cause over-merging if you make maxBufferedDocs too large.

I think the rule of thumb to avoid this issue is 1) set
maxBufferedDocs to be no more than 10X the "typical" number of docs
you will flush, and then 2) flush by RAM usage.

So for example if when you flush by RAM you typically flush "around"
200-300 docs, then setting maxBufferedDocs to eg 1000 is good since
it's far above 200-300 (so it won't trigger a flush when you didn't
want it to) but it's also well below 10X your range of docs (so it
won't tickle the above bug).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message