lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-2357) Reduce transient RAM usage while merging by using packed ints array for docID re-mapping
Date Mon, 28 May 2012 17:35:23 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284490#comment-13284490
] 

Michael McCandless commented on LUCENE-2357:
--------------------------------------------

bq. I implemented equals only for testing purposes (see TestSegmentMerger.java) and then hashCode
for consistency. I can move the equals code to the test case if you prefer.

Ahh, OK.  Yeah I think move it to the test case?  Thanks.

bq. There used to be an assert in SegmentMerger but it was removed in r1148938

Ahh you're right!  Hmm, but, we actually know the accurate delCount higher up; let me tweak
the patch a bit to pass this down, so we don't have to re-count it separately.

bq. so I assumed the numDeletedDocs() was unreliable and the del count should be computed
from liveDocs. I am not familiar enough with the merge process to know whether some invariants
are broken or not. Should I open a bug?

As far as I know, it's only unreliable in this context (SegmentReader passed to SegmentMerger
for merging); this is because we allow newly marked deleted docs to happen concurrently up
until the moment we need to pass the SR instance to the merger (search for "// Must sync to
ensure BufferedDeletesStream" in IndexWriter.java) ... but it would be nice to fix that, so
I think open a new issue (it won't block this one)?  We should be able to make a new SR instance,
sharing the same core as the current one but using the correct delCount...
                
> Reduce transient RAM usage while merging by using packed ints array for docID re-mapping
> ----------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2357
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2357
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 4.1
>
>         Attachments: LUCENE-2357.patch
>
>
> We allocate this int[] to remap docIDs due to compaction of deleted ones.
> This uses alot of RAM for large segment merges, and can fail to allocate due to fragmentation
on 32 bit JREs.
> Now that we have packed ints, a simple fix would be to use a packed int array... and
maybe instead of storing abs docID in the mapping, we could store the number of del docs seen
so far (so the remap would do a lookup then a subtract).  This may add some CPU cost to merging
but should bring down transient RAM usage quite a bit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message