lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] [Commented] (LUCENE-4512) Additional memory savings in CompressingStoredFieldsIndex.MEMORY_CHUNK
Date Tue, 30 Oct 2012 18:16:12 GMT


Robert Muir commented on LUCENE-4512:

I tested this really fast on that geonames data again: 72 chunks with bpvs of 16-20 (avg 18
i think).
So this is quite a bit more savings than 29bpv with the trunk code.

I didnt look at the code too much, but since we are computing the average at index-time (i
do you think it still makes sense to encode the deltas from the previous value, or should
we just
up-front encode them at index-time as deltas from the average (if it makes things simpler?)
> Additional memory savings in CompressingStoredFieldsIndex.MEMORY_CHUNK
> ----------------------------------------------------------------------
>                 Key: LUCENE-4512
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>             Fix For: 4.1
>         Attachments: LUCENE-4512.patch
> Robert had a great idea to save memory with {{CompressingStoredFieldsIndex.MEMORY_CHUNK}}:
instead of storing the absolute start pointers we could compute the mean number of bytes per
chunk of documents and only store the delta between the actual value and the expected value
(avgChunkBytes * chunkNumber).
> By applying this idea to every n(=1024?) chunks, we would even:
>  - make sure to never hit the worst case (delta ~= maxStartPointer)
>  - reduce memory usage at indexing time.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message