lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: MergeFactor and MaxBufferedDocs value should ...?
Date Fri, 23 Mar 2007 14:04:19 GMT
I haven't used it yet, but I've seen several references to
IndexWriter.ramSizeInBytes() and using it to control when the writer
flushes the RAM. This seems like a more deterministic way of
making things efficient than trying various combinations of
maxBufferedDocs , MergeFactor, etc, all of which are guesses
at best.

I'd be really curious if it works for you...

Erick

On 3/23/07, SK R <rsk.sen@gmail.com> wrote:
>
> Please clarify the following.
>
>      1.When will be the segments in RAMDirectory moved (flushed) in to
> FSDirectory?
>
>      2.Segments creation by maxBufferedDocs occur in RAMDir. Where merge
> by
> MergeFactor happen? whether in RAMDir or FSDir?
>
> Thanks in Advance
> RSK
>
>
> On 3/23/07, Michael McCandless <lucene@mikemccandless.com> wrote:
> >
> >
> > "SK R" <rsk.sen@gmail.com> wrote:
> > >     If I set MergeFactor = 100 and MaxBufferedDocs=250 , then first
> 100
> > > segments will be merged in RAMDir when 100 docs arrived. At the end of
> > > 350th
> > > doc added to writer , RAMDir have 2 merged segment files + 50 seperate
> > > segment files not merged together and these are flushed to FSDir.
> > >
> > >     If wrong, please correct me.
> > >
> > >     My doubt is whether we should set MergeFactor & MaxBufferedDocs in
> > > proportional ratio (i.e) MaxBufferedDocs = n*MergeFactor where n = 1,2
> > > ...
> > > to reduce indexing time and get greater performance or no need to
> worry
> > > about it's relation?
> >
> > Actually, maxBufferedDocs is how many docs are held in RAM before
> > flushing to a single segment.  So with 250, after adding the 250th doc
> > the writer will write the first segment; after adding the 500th doc,
> > it writes the second segment, etc.
> >
> > Then, mergeFactor says how many segments can be written before a merge
> > takes place.  A mergeFactor of 10 means after writing 10 such
> > segments from above, they will be merged into a single segment with
> > 2500 docs.  After another 2500 docs you'll have 2 such segments.  Then
> > once you've added your 25000'th doc, all of the 2500 doc segments will
> > be merged into a single 25000 segment doc, etc.
> >
> > To maximize indexing performance you really want maxBufferedDocs to be
> > as large as you can handle (the bigger you make it, the more RAM is
> > required by the writer).
> >
> > I believe (not certain) larger values of mergeFactor will also improve
> > performance since it defers merging as long as possible.  However, the
> > larger you make this, the more segments are allowed to exist in your
> > index, and at some point you will hit file handle limits with your
> > searchers.
> >
> > I don't think these two parameters need to be proportional to one
> > another.  I don't think that will affect performance.
> >
> > Another performance boost is to turn off compound file, but, this has
> > a severe cost of requiring far more file handles during searching.
> >
> > Mike
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message