lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Staveley (Tom)" <>
Subject RE: Avoiding java.lang.OutOfMemoryError in an unstored field
Date Tue, 06 Jun 2006 09:43:48 GMT
You are right there are going to be a lot of tokens. The entire boxy of a
text document is getting indexed in an unstored field, but I don't see how I
can flush a partially loaded field. 

-----Original Message-----
From: karl wettin [] 
Sent: 06 June 2006 10:33
Subject: RE: Avoiding java.lang.OutOfMemoryError in an unstored field

On Tue, 2006-06-06 at 10:22 +0100, Rob Staveley (Tom) wrote:
> Thanks for the response, Karl. I am using FSDirectory.
> -X:AggressiveHeap might reduce the number of times I get bitten by the 
> problem, but I'm really looking for a streaming/serialised approach [I 
> think!], which allows me to handle objects which are larger than 
> available memory. Using the constructor for the 
> unstored org.apache.lucene.document.Field means that I do not need to 
> load the untokenised content entirely into RAM, but I'm hoping that 
> the tokenised content of org.apache.lucene.document.Field and 
> org.apache.lucene.document.Document also do not need to live in RAM, 
> because that puts a limit on document size.

The instances of Document and their Fields are only created at
Hits.doc() and when you create them for insertion in the index. They live
only as long as you keep track of them. Lucene keeps no references as far as
I know.

As I wrote in my second reply, if you run out of memory when indexing it
sounds to me as you allow lots and lots of tokens per field. In this case
you should look at flushing your documents more often. 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message