lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-2357) Reduce transient RAM usage while merging by using packed ints array for docID re-mapping
Date Tue, 03 Apr 2012 14:16:24 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245343#comment-13245343
] 

Michael McCandless commented on LUCENE-2357:
--------------------------------------------

Hi Iulius,

The basic idea is to replace the fixed int[] that we now have (in oal.index.MergeState's docMaps
array) with a PackedInts store (see oal.util.packed.PackedInts.getMutable).  This should be
fairly simple, since a PackedInts store is concetually just like an int[].

I think that (a rote swap) would be phase one.

After that, we can save more RAM by storing either the new docID (what we do today), or, inverting
that and storing instead the number of del docs seen so far, depending on which requires fewer
bits.  EG if we are merging 1M docs but only 100K are deleted it's cheaper to store the number
of deletes...
                
> Reduce transient RAM usage while merging by using packed ints array for docID re-mapping
> ----------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2357
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2357
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Michael McCandless
>            Priority: Minor
>              Labels: gsoc2012, lucene-gsoc-12
>             Fix For: 4.0
>
>
> We allocate this int[] to remap docIDs due to compaction of deleted ones.
> This uses alot of RAM for large segment merges, and can fail to allocate due to fragmentation
on 32 bit JREs.
> Now that we have packed ints, a simple fix would be to use a packed int array... and
maybe instead of storing abs docID in the mapping, we could store the number of del docs seen
so far (so the remap would do a lookup then a subtract).  This may add some CPU cost to merging
but should bring down transient RAM usage quite a bit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message