lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCENE-2701) Factor maxMergeSize into findMergesForOptimize in LogMergePolicy
Date Fri, 15 Oct 2010 12:46:32 GMT


Michael McCandless commented on LUCENE-2701:

Patch looks good!

Maybe rename OneMerge.totalSize -> totalSizeInBytes?  Hmm does anyone
actually call this new method?

Maybe note somewhere that now optimize (when there's a maxMergeDocs/MB
constraint) is able to merge fewer than mergeFactor segments at a

This code is a bit confusing:

       if (last - start - 1 > 1) {
         // there is more than 1 segment to the right of this one.
         spec.add(new OneMerge(infos.range(start + 1, last), useCompoundFile));
       } else if (start != last - 1 && !isOptimized( + 1))) {
          spec.add(new OneMerge(infos.range(start + 1, last), useCompoundFile));

Both if clauses are doing the same thing right?  (Ie merging the chunk
of segs to the right). Maybe put a comment explaining the 2nd one?  (I
think it's for the case where there's 1 segment to our right but it's
not optimized, eg the CFS differs?).  Or maybe consolidate into a single

> Factor maxMergeSize into findMergesForOptimize in LogMergePolicy
> ----------------------------------------------------------------
>                 Key: LUCENE-2701
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1, 4.0
>         Attachments: LUCENE-2701.patch, LUCENE-2701.patch
> LogMergePolicy allows you to specify a maxMergeSize in MB, which is taken into consideration
in regular merges, yet ignored by findMergesForOptimze. I think it'd be good if we take that
into consideration even when optimizing. This will allow the caller to specify two constraints:
maxNumSegments and maxMergeMB. Obviously both may not be satisfied, and therefore we will
guarantee that if there is any segment above the threshold, the threshold constraint takes
precedence and therefore you may end up w/ <maxNumSegments (if it's not 1) after optimize.
Otherwise, maxNumSegments is taken into consideration.
> As part of this change, I plan to change some methods to protected (from private) and
members as well. I realized that if one wishes to implement his own LMP extension, he needs
to either put it under o.a.l.index or copy some code over to his impl.
> I'll attach a patch shortly.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message