lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: IndexWriter.ramSizeInBytes() no longer returns to 0 after commit()?
Date Tue, 23 Aug 2011 18:45:39 GMT
Hmm... this looks like a side-effect of LUCENE-2680, which was merged
back from trunk to 3.1.

So the problem is, IW recycles the RAM it has allocated, and so this
method is returning the allocated RAM, even if those buffers are not
in fact in use right now (ie, filled with postings data).  I think
it's important that it does this, ie, it should be honest that it is
in fact tying up RAM.

Maybe we could fix this by adding a new method that tells you how much
of the buffers are really in-use... but I don't think we directly
track that now; it'd have to be computed from the free buffers lists
inside DocumentsWriter.

BTW, why not have IW flush by RAM itself?  This way it will flush (but
not commit) the postings to disk... commit is rather costly since it
fsyncs all the newly written files.

Mike McCandless

http://blog.mikemccandless.com

On Tue, Aug 23, 2011 at 12:17 AM, Trejkaz <trejkaz@trypticon.org> wrote:
> Hi all.
>
> We are using IndexWriter with no limits set and managing the commits
> ourselves, mainly so that we can ensure they are done at the same time
> as other (non-Lucene) commits.
>
> After upgrading from 3.0 ~ 3.3, we are seeing a change in
> ramSizeInBytes() behaviour where it is no longer resetting to zero
> after a commit().  The end result is that after a while, the code
> wants to commit after adding even a single document.
>
> I boiler it down to a test case (though I'm obviously just using JUnit
> as a helper here):
>
>    @Test
>    public void testIndexWriterByteCount() throws Exception
>    {
>        Directory directory = new RAMDirectory();
>        IndexWriter writer = new IndexWriter(directory, new
> WhitespaceAnalyzer(), IndexWriter.MaxFieldLength.UNLIMITED);
>        System.out.println("At start: " + writer.ramSizeInBytes());
>
>        for (int j = 0; j < 3; j++)
>        {
>            for (int i = 0; i < 5; i++)
>            {
>                Document document = new Document();
>                document.add(new Field("text", "a", Field.Store.YES,
> Field.Index.ANALYZED));
>                writer.addDocument(document);
>            }
>            System.out.println("After adding some docs: " +
> writer.ramSizeInBytes());
>
>            writer.commit();
>            System.out.println("After commit: " + writer.ramSizeInBytes());
>        }
>
>        writer.close();
>        directory.close();
>    }
>
> The results on Lucene 3.3.0:
>
>    At start: 0
>    After adding some docs: 99400
>    After commit: 99344
>    After adding some docs: 99400
>    After commit: 99344
>    After adding some docs: 99400
>    After commit: 99344
>
> The results of running more or less the same test on Lucene 3.0.3:
>
>    At start: 0
>    After adding some docs: 115712
>    After commit: 0
>    After adding some docs: 50176
>    After commit: 0
>    After adding some docs: 50176
>    After commit: 0
>
> Questions:
>
> (1) Is Lucene now caching more than it used to be caching, which would
> account for the extra space usage, or is this simply a bug where the
> count isn't being updated correctly?
>
> (2) Is checking ramSizeInBytes() still the recommended way to
> determine whether it's time to commit()?
>
> TX
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message