lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Staveley (Tom)" <>
Subject RE: Avoiding java.lang.OutOfMemoryError in an unstored field
Date Tue, 06 Jun 2006 11:44:42 GMT
Thanks, Karl. It would be good if maxBufferedDocs could respond dynamically
to available heap. It seems a shame to set <10 for the sake of sporadic
large documents. Failing that, it would be nice if we could explicitly
pre-flush buffers when we encounter a big field.

I'm increasingly thinking that mergeFactor is what I need to look at. I
currently have it set to the default 10, but bearing in mind that it is a
real-time application (indexing messages from an MTA), it makes sense to
make this smaller. Is the RAM requirement die to mergeFactor a product of
Document size and mergeFactor or does Document size have no bearing on the
RAM requirement due to mergeFactor?

-----Original Message-----
From: karl wettin [] 
Sent: 06 June 2006 10:48
Subject: RE: Avoiding java.lang.OutOfMemoryError in an unstored field

On Tue, 2006-06-06 at 10:43 +0100, Rob Staveley (Tom) wrote:
> You are right there are going to be a lot of tokens. The entire boxy 
> of a text document is getting indexed in an unstored field, but I 
> don't see how I can flush a partially loaded field.

Check these out:

void setMaxBufferedDocs(int maxBufferedDocs) 
          Determines the minimal number of documents required before the
buffered in-memory documents are merging and a new Segment is created. 

void setMaxFieldLength(int maxFieldLength) 
          The maximum number of terms that will be indexed for a single
field in a document.  

void setMergeFactor(int mergeFactor) 
          Determines how often segment indices are merged by addDocument().

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message