lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2655) Get deletes working in the realtime branch
Date Wed, 20 Oct 2010 09:04:23 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922894#action_12922894
] 

Michael McCandless commented on LUCENE-2655:
--------------------------------------------

bq. You're saying record a list of segments that existed at the time of flushing a DWPT's
deletes?

Actually, I think it's simpler!  I think the DWPT just records the index of the last segment
in the index, as of when it is created (or re-inited after it's been flushed).

On flush of a given DWPT, its buffered deletes are recorded against that segment, and still
carry over the lastSegmentIndex.  This way, when we finally do resolve these deletes to docIDs,
we 1) always apply the delete if segment <= lastSegmentIndex, or 2) the doc is in that
segment and is <= the docID upto.  I think this'd mean we can keep the docid-upto as local
docIDs, which is nice (no globalizing/shifting-on-merge/flush needed).

So, segments flushed in the current IW session will carry this private pool of pending deletes.
 But, pre-existing segments in the index don't need their own pool.  Instead, when it's time
to resolve the buffered deletes against them (because they are about to be merged), they must
walk all of the per-segment pools, resolving the deletes from that pool if its segment index
is <= the lastSegmentIndex of that pool.  We should take care to efficiently handle dup'd
terms, ie where the same del term is present in multiple pools.  The most recent one "wins",
and we should do only one delete (per segment) for that term.

These per-segment delete pools must be updated on merge.  EG if the lastSegmentIndex of a
pool gets merged, that's fine, but then on merge commit we must move that lastSegmentIndex
"backwards" to the last segment before the merge, because any deletes necessary within the
segment will have been resolved already.

When segments with del pools are merged, we obviously apply the deletes to the segments being
merged, but, then, we have to coalesce those pools and move them into a single pool on the
segment just before the merge.  We could actually use a Set at this point since there is no
more docid-upto for this pool (ie, it applies to all docs on that segment and in segments
prior to it).

So I think this is much simpler than I first thought!

bq. Lets get that data structure mapped out to start on LUCENE-2680?

+1

> Get deletes working in the realtime branch
> ------------------------------------------
>
>                 Key: LUCENE-2655
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2655
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2655.patch
>
>
> Deletes don't work anymore, a patch here will fix this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message