lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3569) Consolidate IndexWriter's optimize, maybeMerge and expungeDeletes under one merge(MP) method
Date Tue, 15 Nov 2011 11:54:51 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150405#comment-13150405
] 

Michael McCandless commented on LUCENE-3569:
--------------------------------------------

For natural merges I think the existing MergePolicy makes sense: it's
embedded into IW (IWC) and is invoked whenever there is a change to
the segments (eg, new segment flushed).

But for forced merges (either forceMerge or expungeDeletes)... I don't
think we need a new MergePolicy-like class?  Can't this "outside
logic" simply invoke registerMerge() directly on the incoming IW?

So eg in contrib/misc (say), we'd add a new IndexUtils class (or
something); it has a static method "expungeDeletes", that takes an IW
instance.  When the app calls that method, it inspects the IW's
segments, chooses its merges, and registers them.

Just like a MergePolicy, the method would have to check which merges
are already running/registered (IW.getMergingSegments) and "work
around" them.  EG, if there are 7 segments with deletions, you check
and see that 4 of them are already merging / scheduled for merge, so
you know you only have to merge the other 3.

                
> Consolidate IndexWriter's optimize, maybeMerge and expungeDeletes under one merge(MP)
method
> --------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3569
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3569
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Shai Erera
>
> Today, IndexWriter exposes 3 methods for 'cleaning up' / 'compacting' / 'optimizing'
your index:
> * optimize() -- merges as much segments as possible (down to 1 segment), and is discouraged
in many cases because of its performance implications.
> * maybeMerge() -- runs 'subtle' merges. Attempts to balance the index by not leaving
too many segments, yet not merging large segments if unneeded.
> * expungeDeletes() -- cleans up deleted documents from segments and on the go merges
them.
> * a default MP that can be set on IndexWriterConfig, for ongoing merges IW performs (i.e.
as a result of flushing a new segment).
> These methods are confusing in several levels:
> * Their names are misleading, see LUCENE-3454.
> * Why does expungeDeletes need to merge segments?
> * Eventually, they really do what the MergePolicy decides that should be done. I.e.,
one could write an MP that always merges all segments, and therefore calling maybeMerge would
not be so subtle anymore. On the other hand, one could write an MP that never merges large
segments (we in fact have several of those), and therefore calling optimize(1) would not end
up with one segment.
> So the proposal is to replace all these methods with a single one merge(MergePolicy)
(more on the names later). MergePolicy will have only one method findSegmentsForMerge and
the caller will be responsible to configure it in order to perform the needed merges. We will
provide ready-to-use MPs:
> * LightMergePolicy -- for setting on IWC and doing the ongoing merges IW executes. This
one will pick segments respecting various parameters such as mergeFactor, segmentSizes etc.
> * HeavyMergePolicy -- for doing the optimize()-style merges.
> * ExpungeDeletesMergePolicy -- for expunging deletes (my proposal is to drop segment
merging from it, by default).
> Now about the names:
> * I think that it will be good, API-backcompat wise and in general, if we name that method
doMaintenance (as expungeDeletes does not have to merge anything).
> * Instead of MergePolicy we call it MaintenancePolicy and similarly its single method
findSegmentsForMaintenance, or getMaintenanceSpecification.
> * I called the MPs Light and Heavy just for the text, I think a better name should be
found, but nothing comes up to mind now.
> It will allow us to use this on 3.x, by deprecating MP and all related methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message