lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "SK R" <rsk....@gmail.com>
Subject Re: MergeFactor and MaxBufferedDocs value should ...?
Date Fri, 23 Mar 2007 13:05:30 GMT
Please clarify the following.

     1.When will be the segments in RAMDirectory moved (flushed) in to
FSDirectory?

     2.Segments creation by maxBufferedDocs occur in RAMDir. Where merge by
MergeFactor happen? whether in RAMDir or FSDir?

Thanks in Advance
RSK


On 3/23/07, Michael McCandless <lucene@mikemccandless.com> wrote:
>
>
> "SK R" <rsk.sen@gmail.com> wrote:
> >     If I set MergeFactor = 100 and MaxBufferedDocs=250 , then first 100
> > segments will be merged in RAMDir when 100 docs arrived. At the end of
> > 350th
> > doc added to writer , RAMDir have 2 merged segment files + 50 seperate
> > segment files not merged together and these are flushed to FSDir.
> >
> >     If wrong, please correct me.
> >
> >     My doubt is whether we should set MergeFactor & MaxBufferedDocs in
> > proportional ratio (i.e) MaxBufferedDocs = n*MergeFactor where n = 1,2
> > ...
> > to reduce indexing time and get greater performance or no need to worry
> > about it's relation?
>
> Actually, maxBufferedDocs is how many docs are held in RAM before
> flushing to a single segment.  So with 250, after adding the 250th doc
> the writer will write the first segment; after adding the 500th doc,
> it writes the second segment, etc.
>
> Then, mergeFactor says how many segments can be written before a merge
> takes place.  A mergeFactor of 10 means after writing 10 such
> segments from above, they will be merged into a single segment with
> 2500 docs.  After another 2500 docs you'll have 2 such segments.  Then
> once you've added your 25000'th doc, all of the 2500 doc segments will
> be merged into a single 25000 segment doc, etc.
>
> To maximize indexing performance you really want maxBufferedDocs to be
> as large as you can handle (the bigger you make it, the more RAM is
> required by the writer).
>
> I believe (not certain) larger values of mergeFactor will also improve
> performance since it defers merging as long as possible.  However, the
> larger you make this, the more segments are allowed to exist in your
> index, and at some point you will hit file handle limits with your
> searchers.
>
> I don't think these two parameters need to be proportional to one
> another.  I don't think that will affect performance.
>
> Another performance boost is to turn off compound file, but, this has
> a severe cost of requiring far more file handles during searching.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message