lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trejkaz <>
Subject Re: IndexWriter.ramSizeInBytes() no longer returns to 0 after commit()?
Date Wed, 24 Aug 2011 00:03:20 GMT
On Wed, Aug 24, 2011 at 4:45 AM, Michael McCandless
<> wrote:
> Hmm... this looks like a side-effect of LUCENE-2680, which was merged
> back from trunk to 3.1.
> So the problem is, IW recycles the RAM it has allocated, and so this
> method is returning the allocated RAM, even if those buffers are not
> in fact in use right now (ie, filled with postings data).  I think
> it's important that it does this, ie, it should be honest that it is
> in fact tying up RAM.
> Maybe we could fix this by adding a new method that tells you how much
> of the buffers are really in-use... but I don't think we directly
> track that now; it'd have to be computed from the free buffers lists
> inside DocumentsWriter.
> BTW, why not have IW flush by RAM itself?  This way it will flush (but
> not commit) the postings to disk... commit is rather costly since it
> fsyncs all the newly written files.

I think we're worried about the consequences of leaving around
partially written segments, particularly in the case where the
indexing process crashes multiple times.

Although, even if we turned on flushing, it seems like we still need
to know when to commit(), because we commit Lucene and other things at
the same time.  We were determining an appropriate time based on the
amount of data which wasn't committed yet, but it isn't possible to do
that with the current version as far as I can tell (you can get the
number of documents, but documents in the real world are so differing
in size that the number isn't useful.)

I might have a try at adding a method to DocumentsWriter to compute
the amount of actual used space.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message