lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2680) Improve how IndexWriter flushes deletes against existing segments
Date Sat, 27 Nov 2010 19:54:38 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964441#action_12964441
] 

Michael McCandless commented on LUCENE-2680:
--------------------------------------------

bq. In the apply merge deletes case, won't we want to add deletesInRAM in the getForwardDeletes
method?

No, we can't add those deletes until the current buffered segment is successfully flushed.

Eg, say the segment hits a disk full on flush, and DocsWriter aborts (discards all buffered
docs/deletions from that segment).  If we included these deletesInRAM when applying deletes
then suddenly the app will see that some deletes were applied yet the added documents were
not.  So on disk full during flush, calls to .updateDocument may wind up deleting the old
doc but not adding the new one.

So we need to keep them segregated for proper error case semantics.

{quote}
Though for the failing unit test it does not matter, we need to figure
out a solution for the pending doc ids deletions, eg, they can't simply
transferred around, they probably need to be applied as soon as possible.
Otherwise they require remapping.
{quote}

Hmm why must we remap?  Can't we carry these buffered deleteByDocIDs along with the segment?
 The docIDs would be the segment's docIDs (ie no base added) so no shifting is needed?

These deleted docIDs would only apply to the current segment, ie would not be included in
getForwardDeletes?

> Improve how IndexWriter flushes deletes against existing segments
> -----------------------------------------------------------------
>
>                 Key: LUCENE-2680
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2680
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch,
LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch,
LUCENE-2680.patch, LUCENE-2680.patch
>
>
> IndexWriter buffers up all deletes (by Term and Query) and only
> applies them if 1) commit or NRT getReader() is called, or 2) a merge
> is about to kickoff.
> We do this because, for a large index, it's very costly to open a
> SegmentReader for every segment in the index.  So we defer as long as
> we can.  We do it just before merge so that the merge can eliminate
> the deleted docs.
> But, most merges are small, yet in a big index we apply deletes to all
> of the segments, which is really very wasteful.
> Instead, we should only apply the buffered deletes to the segments
> that are about to be merged, and keep the buffer around for the
> remaining segments.
> I think it's not so hard to do; we'd have to have generations of
> pending deletions, because the newly merged segment doesn't need the
> same buffered deletions applied again.  So every time a merge kicks
> off, we pinch off the current set of buffered deletions, open a new
> set (the next generation), and record which segment was created as of
> which generation.
> This should be a very sizable gain for large indices that mix
> deletes, though, less so in flex since opening the terms index is much
> faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message