lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter
Date Wed, 08 Aug 2007 17:11:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518508
] 

Michael McCandless commented on LUCENE-847:
-------------------------------------------

> It just occurred to me that there is a neat way to handle deletes
> that are flushed during a concurrent merge. For example, MergePolicy
> decides to merge segments B and C, with B's delete file 0001 and C's
> 100. When the concurrent merge finishes, B's delete file becomes
> 0011 and C's 110. We do a simple computation on the delete bit
> vectors and check in the merged segment with delete file 00110

Excellent!  This lets you efficiently merge in the additional deletes
(if any) that were flushed against each of the merged segments after
the merge had begun.  Furthermore, I think this is all contained
within IndexWriter, right?

Ie when we go to "replace/checkin" the newly merged segment, this
"merge newly flushed deletes" would execute at that time.  And, I
think, we would block flushes while this is happening, but
addDocument/deleteDocument/updateDocument would still be allowed?

It should in fact be quite fast to run since delete BitVectors is all
in RAM.

> I'm thinking about the impact of adding "deleteDocument(int doc)" on
> LUCENE-847, especially on concurrent merge. The semantics of
> "deleteDocument(int doc)" is that the document to delete is
> specified by the document id on the index at the time of the
> call. When a merge is finished and the result is being checked into
> IndexWriter's SegmentInfos, document ids may change. Therefore, it
> may be necessary to flush buffered delete doc ids (thus buffered
> docs and delete terms as well) before a merge result is checked in.
>
> The flush is not necessary if there is no buffered delete doc ids. I
> don't think it should be the reason not to support
> "deleteDocument(int doc)" in IndexWriter. But its impact on
> concurrent merge is a concern.

Couldn't we also just update the docIDs of pending deletes, and not
flush?  Ie we know the mapping of old -> new docID caused by the
merge, so we can run through all deleted docIDs and remap?


> Factor merge policy out of IndexWriter
> --------------------------------------
>
>                 Key: LUCENE-847
>                 URL: https://issues.apache.org/jira/browse/LUCENE-847
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Steven Parkes
>            Assignee: Steven Parkes
>         Attachments: concurrentMerge.patch, LUCENE-847.patch.txt, LUCENE-847.txt
>
>
> If we factor the merge policy out of IndexWriter, we can make it pluggable, making it
possible for apps to choose a custom merge policy and for easier experimenting with merge
policy variants.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message