lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: IndexWriter.ramSizeInBytes
Date Thu, 14 Apr 2011 12:29:52 GMT
This is actually [sadly] expected.

This is showing that your RAM efficiency is ~50% (well, less, if the
segment also has stored fields / term vectors).

This is because the in-RAM data structures cannot be 100% efficient as
they must leave room to "grow" the individual postings.  But once
written on disk the format is obviously compacted vs what's in RAM.

Mike

http://blog.mikemccandless.com

On Thu, Apr 14, 2011 at 7:21 AM, Shai Erera <serera@gmail.com> wrote:
> Hi
>
> I'm indexing w/ IW, flush-by-RAM=off and flush-by-doc=MAX_INT. Whenever
> iw.ramSizeInBytes() >= threshold, I commit the changes, serializes the
> Directory somewhere and starts with a new Directory and IW instance.
>
> The threshold is currently 32MB. I noticed though that the size of the
> serialized Directory is nearly half (<16 MB). Is that expected? Will I see
> that behavior every time (e.g. w/ large stored fields), or is it data
> dependent? I assume that the data can affect the compression, but I never
> thought that by 50% factor, from RAM to disk.
>
> Shai
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message