lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: OutOfMemoryError indexing large documents
Date Wed, 26 Nov 2014 13:26:55 GMT
Is that 100MB for a single Lucene document? And is that 100MB for a single 
field? Is that field analyzed text? How complex is the analyzer? Like, does 
it do ngrams or something else that is token or memory intensive? Posting 
the analyzer might help us see what the issue might be.

Try indexing only one document at a time - maybe GC is occurring due to 
activity on one stream and then the parallel streams are then trying to 
index while the GC is in progress.

Alternatively, try running with a lot smaller heap since a large heap means 
GC will take longer.

You might consider a strategy where only one large document can be processed 
at a time - have other threads pause if a large document is currently being 
processed or maybe allow only a few large documents to be processed at the 
same time.

What is your average document size? I mean, are the large documents a rarity 
so that the above strategy would be reasonable, or do you need to process 
large numbers of large documents.

-- Jack Krupansky

-----Original Message----- 
From: ryanb
Sent: Tuesday, November 25, 2014 7:39 PM
To: java-user@lucene.apache.org
Subject: OutOfMemoryError indexing large documents

Hello,

We use vanilla Lucene 4.9.0 in a 64 bit Linux OS. We sometimes need to index
large documents (100+ MB), but this results in extremely high memory usage,
to the point of OutOfMemoryError even with 17GB of heap. We allow up to 20
documents to be indexed simultaneously, but the text to be analyzed and
indexed is streamed, not loaded into memory all at once.

Any suggestions for how to troubleshoot or ideas about the problem are
greatly appreciated!

Some details about our setup (let me know what other information will help):
- Use MMapDirectory wrapped in a NRTCachingDirectory
- RamBufferSize 64MB
- No compund files
- We commit every 20 seconds

Thanks,
Ryan



--
View this message in context: 
http://lucene.472066.n3.nabble.com/OutOfMemoryError-indexing-large-documents-tp4170983.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message