lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless" <>
Subject Re: [jira] Created: (LUCENE-854) Create merge policy that doesn't periodically inadvertently optimize
Date Wed, 02 May 2007 19:59:08 GMT

"Ning Li" <> wrote:
> On 3/31/07, Michael McCandless (JIRA) <> wrote:
> > Create merge policy that doesn't periodically inadvertently optimize
> > --------------------------------------------------------------------
> > So we could make a small change to the policy by only merging the
> > first mergeFactor segments once we hit 2X the merge factor.  With
> > mergeFactor=10, when we have created the 20th level 0 (just flushed)
> > segment, we merge the first 10 into a level 1 segment.  Then on
> > creating another 10 level 0 segments, we merge the second set of 10
> > level 0 segments into a level 1 segment, etc.
> Hi Mike,
> When a 20th level 0 segment triggers a 20th level 1 segment which
> triggers a 20th level 2 segment... we are still optimizing, aren't we?
> Am I missing something here?

The merge would "cascade" in this case, but, would not optimize (you
will have > 1 segments in the end).

Each time you cascade you only merge the first 10 at each level, so
after cascading you would have 1 level 3 segment, 10 level 2 segments,
10 level 1 segments and 10 level 0 segments.

I'm actually using this merge policy in my patch for LUCENE-843 when
merging the flushed "partial" segments.  This is only used when
IndexWriter is opened with autoCommit=false, and, you add lots
and lots of documents (so RAM flushes many times).

But, I like the proposed merge policy at the end of LUCENE-845 even
better for Lucene's normal merges.

It would merge based on size (not # docs), would be free to merge
adjacent segments (not just rightmost segments), and would merge N
(configurable) at a time.  The part that's still unclear is how it
chooses when to "trigger" a merge and how specifically it picks which
N segments to merge (maybe: the series of N adjacent segments that are
"most similar" in size, but favoring smaller segments over larger

  Michael McCandless

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message