lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2680) Improve how IndexWriter flushes deletes against existing segments
Date Mon, 29 Nov 2010 10:30:38 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964680#action_12964680
] 

Michael McCandless commented on LUCENE-2680:
--------------------------------------------

{quote}
I guess you think the sync on doc writer is the cause of the
TestStressIndexing2 unit test failure?
{quote}

I'm not sure what's causing the failure, but, I think getting the net approach roughly right
is the first goal, and then we see what's failing.

{quote}
bq. I think we should move this delete handling out of DW

I agree, I originally took this approach however unit tests were failing
when segment infos was passed directly into the apply deletes method(s).
This'll be the 2nd time however apparently the 3rd time's the charm.
{quote}

Not only moving the SegmentInfos out of DW as a member, but also move all the applyDeletes
logic out.  Ie it should be IW that pulls readers from the pool, walks the merged del term/queries/per-seg
docIDs and actually does the deletion.

bq. In moving deletes out of DW, how should we handle the bufferDeleteTerms sync on DW and
the containing waitReady?

I think all the bufferDeleteX would move into IW, and timeToFlushDeletes. The RAM accounting
can be done fully inside IW.

The waitReady(null) is there so that DW.pauseAllThreads also pauses any threads doing deletions.
 But, in moving these methods to IW, we'd make them sync on IW (they are now sync'd on DW),
which takes care of pausing these threads.

> Improve how IndexWriter flushes deletes against existing segments
> -----------------------------------------------------------------
>
>                 Key: LUCENE-2680
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2680
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch,
LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch,
LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch
>
>
> IndexWriter buffers up all deletes (by Term and Query) and only
> applies them if 1) commit or NRT getReader() is called, or 2) a merge
> is about to kickoff.
> We do this because, for a large index, it's very costly to open a
> SegmentReader for every segment in the index.  So we defer as long as
> we can.  We do it just before merge so that the merge can eliminate
> the deleted docs.
> But, most merges are small, yet in a big index we apply deletes to all
> of the segments, which is really very wasteful.
> Instead, we should only apply the buffered deletes to the segments
> that are about to be merged, and keep the buffer around for the
> remaining segments.
> I think it's not so hard to do; we'd have to have generations of
> pending deletions, because the newly merged segment doesn't need the
> same buffered deletions applied again.  So every time a merge kicks
> off, we pinch off the current set of buffered deletions, open a new
> set (the next generation), and record which segment was created as of
> which generation.
> This should be a very sizable gain for large indices that mix
> deletes, though, less so in flex since opening the terms index is much
> faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message