lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: OutOfMemoryError indexing large documents
Date Wed, 26 Nov 2014 03:09:36 GMT
Well
1> don't send 20 docs at once. Or send docs over some size N by themselves.

2> seriously consider the utility of indexing a 100+M file. Assuming
it's mostly text, lots and lots and lots of queries will match it, and
it'll score pretty low due to length normalization. And you probably
can't return it to the user. And highlighting it will be a performance
problem. And may blow out memory too. And...

May be an XY problem.
Best,
Erick

On Tue, Nov 25, 2014 at 4:39 PM, ryanb <ryanblais@everlaw.com> wrote:
> Hello,
>
> We use vanilla Lucene 4.9.0 in a 64 bit Linux OS. We sometimes need to index
> large documents (100+ MB), but this results in extremely high memory usage,
> to the point of OutOfMemoryError even with 17GB of heap. We allow up to 20
> documents to be indexed simultaneously, but the text to be analyzed and
> indexed is streamed, not loaded into memory all at once.
>
> Any suggestions for how to troubleshoot or ideas about the problem are
> greatly appreciated!
>
> Some details about our setup (let me know what other information will help):
> - Use MMapDirectory wrapped in a NRTCachingDirectory
> - RamBufferSize 64MB
> - No compund files
> - We commit every 20 seconds
>
> Thanks,
> Ryan
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/OutOfMemoryError-indexing-large-documents-tp4170983.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message