lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: IndexWriter.ramSizeInBytes
Date Thu, 14 Apr 2011 12:29:52 GMT
This is actually [sadly] expected.

This is showing that your RAM efficiency is ~50% (well, less, if the
segment also has stored fields / term vectors).

This is because the in-RAM data structures cannot be 100% efficient as
they must leave room to "grow" the individual postings.  But once
written on disk the format is obviously compacted vs what's in RAM.


On Thu, Apr 14, 2011 at 7:21 AM, Shai Erera <> wrote:
> Hi
> I'm indexing w/ IW, flush-by-RAM=off and flush-by-doc=MAX_INT. Whenever
> iw.ramSizeInBytes() >= threshold, I commit the changes, serializes the
> Directory somewhere and starts with a new Directory and IW instance.
> The threshold is currently 32MB. I noticed though that the size of the
> serialized Directory is nearly half (<16 MB). Is that expected? Will I see
> that behavior every time (e.g. w/ large stored fields), or is it data
> dependent? I assume that the data can affect the compression, but I never
> thought that by 50% factor, from RAM to disk.
> Shai

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message