lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adrien Grand (JIRA)" <>
Subject [jira] [Updated] (LUCENE-5373) Lucene42DocValuesProducer.ramBytesUsed is over-estimated
Date Thu, 19 Dec 2013 19:47:07 GMT


Adrien Grand updated LUCENE-5373:

    Attachment: LUCENE-5373.patch

Here is a patch. Lucene42DocValuesProducer no more relies on {{RamUsageEstimator.sizeOf(Object)}}
but instead has a member that stores its memory usage which is incremented every time we load
doc values on a new field. This should be both faster and more accurate.

I didn't take into account object alignment, the numeric/binary/fst entries and the size of
some small hash tables on purpose to keep size estimation simple. These should be very small
compared to the structures that actually store doc values anyway.

> Lucene42DocValuesProducer.ramBytesUsed is over-estimated
> --------------------------------------------------------
>                 Key: LUCENE-5373
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-5373.patch
> Lucene42DocValuesProducer.ramBytesUsed uses {{RamUsageEstimator.sizeOf(this)}} to return
an estimation of the memory usage. One of the issues (there might be other ones) is that this
class has a reference to an IndexInput that might link to other data-structures that we wouldn't
want to take into account. For example, index inputs of a {{RAMDirectory}} all point to the
directory itself, so {{Lucene42DocValuesProducer.ramBytesUsed}} would return the amount of
memory used by the whole directory.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message