lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-854) Create merge policy that doesn't periodically inadvertently optimize
Date Fri, 11 Feb 2011 15:39:57 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael McCandless updated LUCENE-854:
--------------------------------------

    Fix Version/s: 4.0
                   3.2

> Create merge policy that doesn't periodically inadvertently optimize
> --------------------------------------------------------------------
>
>                 Key: LUCENE-854
>                 URL: https://issues.apache.org/jira/browse/LUCENE-854
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>    Affects Versions: 2.2
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-854.patch
>
>
> The current merge policy, at every maxBufferedDocs *
> power-of-mergeFactor docs added, will do a fully cascaded merge, which
> is the same as an optimize.
> I think this is not good because at that "optimization poin", the
> particular addDocument call is [surprisingly] very expensive.  While,
> amortized over all addDocument calls, the cost is low, the cost is
> paid "up front" and in a very "bunched up" manner.
> I think of this as "pay it forward": you are paying the full cost of
> an optimize right now on the expectation / hope that you will be
> adding a great many more docs.  But, if you don't add that many more
> docs, then, the amortized cost for your index is in fact far higher
> than it should have been.  Better to "pay as you go" instead.
> So we could make a small change to the policy by only merging the
> first mergeFactor segments once we hit 2X the merge factor.  With
> mergeFactor=10, when we have created the 20th level 0 (just flushed)
> segment, we merge the first 10 into a level 1 segment.  Then on
> creating another 10 level 0 segments, we merge the second set of 10
> level 0 segments into a level 1 segment, etc.
> With this new merge policy, an index that's a bit bigger than a
> current "optimization point" would then have a lower amortized cost
> per document.  Plus the merge cost is less "bunched up" and less "pay
> it forward": instead you pay for what you are actually using.
> We can start by creating this merge policy (probably, combined with
> with the "by size not by doc count" segment level computation from
> LUCENE-845) and then later decide whether we should make it the
> default merge policy.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message