lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Updated: (LUCENE-854) Create merge policy that doesn't periodically inadvertently optimize
Date Fri, 01 Jun 2007 16:37:16 GMT


Michael McCandless updated LUCENE-854:

    Fix Version/s:     (was: 2.2)

> Create merge policy that doesn't periodically inadvertently optimize
> --------------------------------------------------------------------
>                 Key: LUCENE-854
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>    Affects Versions: 2.2
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
> The current merge policy, at every maxBufferedDocs *
> power-of-mergeFactor docs added, will do a fully cascaded merge, which
> is the same as an optimize.
> I think this is not good because at that "optimization poin", the
> particular addDocument call is [surprisingly] very expensive.  While,
> amortized over all addDocument calls, the cost is low, the cost is
> paid "up front" and in a very "bunched up" manner.
> I think of this as "pay it forward": you are paying the full cost of
> an optimize right now on the expectation / hope that you will be
> adding a great many more docs.  But, if you don't add that many more
> docs, then, the amortized cost for your index is in fact far higher
> than it should have been.  Better to "pay as you go" instead.
> So we could make a small change to the policy by only merging the
> first mergeFactor segments once we hit 2X the merge factor.  With
> mergeFactor=10, when we have created the 20th level 0 (just flushed)
> segment, we merge the first 10 into a level 1 segment.  Then on
> creating another 10 level 0 segments, we merge the second set of 10
> level 0 segments into a level 1 segment, etc.
> With this new merge policy, an index that's a bit bigger than a
> current "optimization point" would then have a lower amortized cost
> per document.  Plus the merge cost is less "bunched up" and less "pay
> it forward": instead you pay for what you are actually using.
> We can start by creating this merge policy (probably, combined with
> with the "by size not by doc count" segment level computation from
> LUCENE-845) and then later decide whether we should make it the
> default merge policy.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message