lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Li (JIRA)" <>
Subject [jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter
Date Wed, 29 Aug 2007 16:22:32 GMT


Ning Li commented on LUCENE-847:

I include comments for both LUCENE-847 and LUCENE-870 here since they are closely related.

I like the stateless approach used for refactoring merge policy. But modeling concurrent merge
(ConcurrentMergePolicyWrapper) as a MergePolicy seems to be inconsistent with the MergePolicy
  1 As you pointed out, "the merge policy is no longer responsible for running the merges
itself". MergePolicy.maybeMerge simply returns a merge specification. But ConcurrentMergePolicyWrapper.maybeMerge
actually starts concurrent merge threads thus doing the merges.
  2 Related to 1, cascading is done in IndexWriter in non-concurrent case. But in concurrent
case, cascading is also done in merge threads which are started by ConcurrentMergePolicyWrapper.maybeMerge.

MergePolicy.maybeMerge should continue to simply return a merge specification. (BTW, should
we rename this maybeMerge to, say, findCandidateMerges?) Can we carve the merge process out
of IndexWriter into a Merger? IndexWriter still provides the building blocks - merge(OneMerge),
mergeInit(OneMerge), etc. Merger uses these building blocks. A ConcurrentMerger extends Merger
but starts concurrent merge threads as ConcurrentMergePolicyWrapper does.

Other comments:
1 UpdateDocument's and deleteDocument's bufferDeleteTerm are synchronized on different variables
in this patch. However, the semantics of updateDocument changed since LUCENE-843. Before LUCENE-843,
updateDocument, which is a delete and an insert, guaranteed the delete and the insert are
committed together (thus an update). Now it's possible that they are committed in different
transactions. If we consider DocumentsWriter as the RAM staging area for IndexWriter, then
deletes are also buffered in RAM staging area and we can restore our previous semantics, right?

2 OneMerge.segments seems to rely on its segment infos' reference to segment infos of IndexWriter.segmentInfos.
The use in commitMerge, which calls ensureContiguousMerge, is an example. However, segmentInfos
can be a cloned copy because of exceptions, thus the reference broken.

3 Calling optimize of an IndexWriter with the current ConcurrentMergePolicyWrapper may cause
deadlock: the one merge spec returned by MergePolicy.optimize may be in conflict with a concurrent
merge (the same merge spec will be returned without changes to segmentInfos), but a concurrent
merge cannot finish because optimize is holding the lock.

4 Finally, a couple of minor things:
  1 LogMergePolicy.useCompoundFile(SegmentInfos infos, SegmentInfo info) and useCompoundDocStore(SegmentInfos
infos): why the parameters?
  2 Do we need doMergeClose in IndexWriter? Can we simply close a MergePolicy if not null?

> Factor merge policy out of IndexWriter
> --------------------------------------
>                 Key: LUCENE-847
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Steven Parkes
>            Assignee: Steven Parkes
>             Fix For: 2.3
>         Attachments: concurrentMerge.patch, LUCENE-847.patch.txt, LUCENE-847.patch.txt,
LUCENE-847.take3.patch, LUCENE-847.take4.patch, LUCENE-847.txt
> If we factor the merge policy out of IndexWriter, we can make it pluggable, making it
possible for apps to choose a custom merge policy and for easier experimenting with merge
policy variants.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message