lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-2680) Improve how IndexWriter flushes deletes against existing segments
Date Mon, 01 Nov 2010 17:58:28 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Rutherglen updated LUCENE-2680:
-------------------------------------

    Attachment: LUCENE-2680.patch

The general approach is to reuse BufferedDeletes though place them into a segment info keyed
map for those segments generated post lastSegmentIndex as per what has been discussed here
https://issues.apache.org/jira/browse/LUCENE-2655?focusedCommentId=12922894&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12922894
and below.

* lastSegmentIndex is added to IW

* DW segmentDeletes is a map of segment info -> buffered deletes.  In the apply deletes
method buffered deletes are pulled for a given segment info if they exist, otherwise they're
taken from deletesFlushedLastSeg.  

* I'm not entirely sure what pushDeletes should do now, probably the same thing as currently,
only the name should change slightly in that it's pushing deletes only for the RAM buffer
docs.

* There needs to be tests to ensure the docid-upto logic is working correctly

* I'm not sure what to do with DW hasDeletes (it's usage is commented out)

* Does there need to be separate deletes for the ram buffer vis-à-vis the (0 - lastSegmentIndex)
deletes?

* The memory accounting'll now get interesting as we'll need to track the RAM usage of terms/queries
across multiple maps.  

* In commitMerge, DW verifySegmentDeletes removes the unused info -> deletes

* testDeletes deletes a doc in segment 1, then merges segments 1 and 2.  We then test to insure
the deletes were in fact applied only to segment 1 and 2.  

* testInitLastSegmentIndex insures that on IW init, the lastSegmentIndex is in fact set


> Improve how IndexWriter flushes deletes against existing segments
> -----------------------------------------------------------------
>
>                 Key: LUCENE-2680
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2680
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-2680.patch
>
>
> IndexWriter buffers up all deletes (by Term and Query) and only
> applies them if 1) commit or NRT getReader() is called, or 2) a merge
> is about to kickoff.
> We do this because, for a large index, it's very costly to open a
> SegmentReader for every segment in the index.  So we defer as long as
> we can.  We do it just before merge so that the merge can eliminate
> the deleted docs.
> But, most merges are small, yet in a big index we apply deletes to all
> of the segments, which is really very wasteful.
> Instead, we should only apply the buffered deletes to the segments
> that are about to be merged, and keep the buffer around for the
> remaining segments.
> I think it's not so hard to do; we'd have to have generations of
> pending deletions, because the newly merged segment doesn't need the
> same buffered deletions applied again.  So every time a merge kicks
> off, we pinch off the current set of buffered deletions, open a new
> set (the next generation), and record which segment was created as of
> which generation.
> This should be a very sizable gain for large indices that mix
> deletes, though, less so in flex since opening the terms index is much
> faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message