lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Smith (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2283) Possible Memory Leak in StoredFieldsWriter
Date Wed, 24 Feb 2010 14:06:28 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837793#action_12837793
] 

Tim Smith commented on LUCENE-2283:
-----------------------------------

I came across this issue looking for a reported memory leak during indexing

a yourkit snapshot showed that the PerDocs for an IndexWriter were using ~40M of memory (at
which point i came across this potentially unbounded memory use in StoredFieldsWriter)
this snapshot seems more or less at a stable point (memory grows but then returns to a "normal"
state), however i have reports that eventually the memory is completely exhausted resulting
in out of memory errors.

I so far have not found any other major culprit in the lucene indexing code.

This index receives a routine mix of very large and very small documents (which would explain
this situation)
The VM and system have more than ample amount of memory given the buffer size and what should
be normal indexing RAM requirements.

Also, a major difference between this leak not occurring and it showing up is that previously,
the IndexWriter was closed when performing commits, now the IndexWriter remains open (just
calling IndexWriter.commit()). So, if any memory is leaking during indexing, it is no longer
being reclaimed during commit. As a side note, closing the index writer at commit time would
sometimes fail, resulting in some following updates to fail because the index writer was locked
and couldn't be reopened until the old index writer was garbage collected, so i don't want
to go back to this for commits.

Its possible there is a leak somewhere else (i currently do not have a snapshot right before
out of memory issues occur, so currently the only thing that stands out is the PerDoc memory
use)

As far as a fix goes, wouldn't it be better to have the RAMFile's used for stored fields pull
and return byte buffers from the byte block pool on the DocumentsWriter? This would allow
the memory to be reclaimed based on the index writers buffer size (otherwise there is no configurable
way to tune this memory use)



> Possible Memory Leak in StoredFieldsWriter
> ------------------------------------------
>
>                 Key: LUCENE-2283
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2283
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 2.4.1
>            Reporter: Tim Smith
>            Assignee: Michael McCandless
>             Fix For: 3.1
>
>
> StoredFieldsWriter creates a pool of PerDoc instances
> this pool will grow but never be reclaimed by any mechanism
> furthermore, each PerDoc instance contains a RAMFile.
> this RAMFile will also never be truncated (and will only ever grow) (as far as i can
tell)
> When feeding documents with large number of stored fields (or one large dominating stored
field) this can result in memory being consumed in the RAMFile but never reclaimed. Eventually,
each pooled PerDoc could grow very large, even if large documents are rare.
> Seems like there should be some attempt to reclaim memory from the PerDoc[] instance
pool (or otherwise limit the size of RAMFiles that are cached) etc

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message