lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: java.lang.OutOfMemoryError: GC overhead limit exceeded
Date Tue, 13 Apr 2010 09:42:25 GMT
On Mon, Apr 12, 2010 at 9:50 AM, Herbert L Roitblat <herb@orcatec.com> wrote:
> Thank you Michael.  Your suggestions are helpful.  I inherited all of the
> code that uses pyLucene and don't consider myself an expert on it, so I very
> much appreciate your suggestions.
>
> It does not seem to be the case that these elements represent the index of
> the collection. TermInfo and Term grow as I retrieve more documents.  There
> was no trouble building the index.

Sorry I meant the "terms dict index".  Lucene internally loads certain
data structures (like the terms dict index, seen as many
TermInfo/Term/String objects) from the index into RAM, so a heap dump
of even a single open IndexReader is expected to show a great many
TermInfo/Term/String objects (though in 3.1 this is greatly improved).

> The contents of these fields are the tokens (some fields are tokenized,
> others not) of the document fields.  In the tokenized fields, there is one
> object for each word. They seem to be in order of the documents for which
> the term vectors are being sought.  So these objects seem to represent a
> "concatenation" of all of the documents being considered in order, and if
> they are never removed, would always overwhelm the heap with a large
> document set.  They are not the index in the usual sense, I think.  Before I
> start retrieving documents, there is barely anything in these objects.
>
> What is holding the document contents in the heap after the fields
> information is returned?
>
> Can you say more about incRef/decRef?  I deleted all variables that
> interacted with Lucene and it seems to have made no difference
>
> There are not a lot of different fields, I would say on the order of 50 with
> about 20 of them in virtually every document.

Can you run CheckIndex (java -ea -cp /path/to/lucene-core-XXX.jar
org.apache.lucene.index.CheckIndex /path/to/your/index) and post the
output?

> It uses:
> lucene.IndexReader.open(self._store)

This is perfectly correct.  If you're not using incRef/decRef on this
reader then, then calling close on it will close it.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message