lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-2680) Improve how IndexWriter flushes deletes against existing segments
Date Wed, 08 Dec 2010 19:45:02 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael McCandless updated LUCENE-2680:
---------------------------------------

    Attachment: LUCENE-2680.patch


OK I started from the last patch and iterated quite a bit.

The per-segment deletes are now working!  All tests pass.  It turned
out to be a little more hairy than we thought because you also must
account for deletes that are pushed backwards onto a segment being
merged (eg due to a flush, or a merge just ahead) while that merge is
still running.

I swapped the two classes -- SegmentDeletes tracks deletes for one
segment, while BufferedDeletes tracks deletes for all segments.

I also wound up doing a fair amount of cleanup/refactoring on how
DW/IW interact, I think a good step towards DWPT.  EG IW's flush is
now much smaller (could *almost* become unsync'd) since I moved
flushing doc stores, building CFS, etc. down into DW.

I added a new FlushControl class to manage (external to DW) triggering
of flushing due to RAM, add doc count, buffered del count.  This way
DWPT can share the single FlushControl instance in IW.


> Improve how IndexWriter flushes deletes against existing segments
> -----------------------------------------------------------------
>
>                 Key: LUCENE-2680
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2680
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch,
LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch,
LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch,
LUCENE-2680.patch
>
>
> IndexWriter buffers up all deletes (by Term and Query) and only
> applies them if 1) commit or NRT getReader() is called, or 2) a merge
> is about to kickoff.
> We do this because, for a large index, it's very costly to open a
> SegmentReader for every segment in the index.  So we defer as long as
> we can.  We do it just before merge so that the merge can eliminate
> the deleted docs.
> But, most merges are small, yet in a big index we apply deletes to all
> of the segments, which is really very wasteful.
> Instead, we should only apply the buffered deletes to the segments
> that are about to be merged, and keep the buffer around for the
> remaining segments.
> I think it's not so hard to do; we'd have to have generations of
> pending deletions, because the newly merged segment doesn't need the
> same buffered deletions applied again.  So every time a merge kicks
> off, we pinch off the current set of buffered deletions, open a new
> set (the next generation), and record which segment was created as of
> which generation.
> This should be a very sizable gain for large indices that mix
> deletes, though, less so in flex since opening the terms index is much
> faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message