lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless" <>
Subject RE: [jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter
Date Sun, 25 Mar 2007 18:52:06 GMT

"Steven Parkes" <> wrote:
> I've been wondering about taking minMergeDocs out of LMP
> (LogarithmicMergePolicy): if IW is doing maxBufferedDocs, can we get by
> with
> 	ceil(log(docs))
> rather than
> 	ceil(log(ceil(docs/minMergeDocs))
> (That's not exactly right, but it's close). The simplicity appeals to
> me, but ...

I think we could do that?

Though if we change the default to be "by #bytes used by each segment"
(for the new default "by size" merge policy) then we can disregard
#docs in a segment during merging entirely?  (And then, leave the "by
#docs" legacy merge policy as is?).

>     If we remove these from the MergePolicy interface then maybe we
>     don't need MergePolicyBase?  (Just to makes things simpler).
> Just a DRY class. I have no strong feeling about this. In fact, I went
> back and forth on it. It's served as a placeholder while I experimented.

Got it.  I was thinking once we removed these params from the base
then there was even less "repeating" to worry about.

>   * I was a little spooked by this change to TestAddIndexesNoOptimize:
>       -    assertEquals(2, writer.getSegmentCount());
>       +    assertEquals(3, writer.getSegmentCount());
>     I think with just the refactoring, there should not need to be any
>     changes to unit tests right?
> I don't know if I this got into what I wrote either in e-mail or in the
> start of the comments. I guess I've done two steps in one here: the
> factoring isn't just renaming methods and classes. I did create an
> MergePolicy interface that is has a slight simplificatin on how the
> merge policy is currently implemented.

Ahhh, sorry, I missed that this was not a pure refactoring.  I think
you did mention this.  OK now that I understand the issue better, I
agree, let's keep the merge policy interface simple.  I think the
merge policy should not need to know the "history" of how the segments
came to be in this index (addIndexes, flush, etc); instead, it should
look at them now and decide 1) whether to merge, and 2) which specific
segments to merge.

>   * It's interesting that you've pulled "useCompoundFile" into the
>     LegacyMergePolicy.  I'm torn on whether it belongs in MergePolicy
>     at all, since this is really a file format issue?
> Well, the idea was here that you might want to use non-compound files
> for big segments (since you have few of them) and compound for smaller
> segments. It basically reflects the idea that to some extent, the merge
> policy is factoring the number of file descriptors required into its
> decision.

Ahh that's a good idea!  I guess we could look at compound file as a
form of merging: you've merged many files into a single file in order
to save on file-descriptors.  OK I think that (moving decision of CFS
or not for a given segment, and, for a newly flushed segment, into the
merge policy) makes sense.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message