lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <karl.wri...@nokia.com>
Subject RE: Lucene 4.0 memory usage during indexing - is this expected?
Date Wed, 03 Oct 2012 14:45:41 GMT
There's a fixed-sized thread pool involved in doing the indexing, of a size that depends on
the machine parameters.
Karl

-----Original Message-----
From: ext Michael McCandless [mailto:lucene@mikemccandless.com] 
Sent: Wednesday, October 03, 2012 10:43 AM
To: Wright Karl (Nokia-LC/Boston)
Subject: Re: Lucene 4.0 memory usage during indexing - is this expected?

This is no good!

Can you send an email to dev@?  This sounds very familiar ... and I had thought we committed
a fix for it ... hopefully Uwe or Robert can remember what it was!

Do you create new threads frequently, to do indexing?  Rather than pulling from a fixed pool?

Mike McCandless

http://blog.mikemccandless.com

On Wed, Oct 3, 2012 at 8:32 AM,  <karl.wright@nokia.com> wrote:
> Hi Mike,
>
>
>
> I've got a technical question for you.
>
>
>
> For background, we've been building a new address search engine on top 
> of Lucene 4.0.  The main customization involves a chain of custom 
> analyzers etc, and it all works quite well.  Or at least it did until 
> I added 7m more documents to the list.  At that point the indexing 
> process began to run out of memory, even though we were giving it some 
> 20GB.  Only some 12GB of that is accounted for in our part of the world.
>
>
>
> Looking at an eclipse MAT dump, the main thing that still seems to 
> grow over time is/are TokenStreamComponent objects that are being held 
> indirectly by org.apache.lucene.index.FieldInvertState objects.  The 
> number of FieldInvertState objects grows and grows.  By the middle of 
> the indexing process, there are 30 of these, and each one of these 
> seems to hold onto one TokenStreamComponent per field.  (Each 
> TokenStreamComponent in turn holds onto a whole pile of things like 
> ICU tokenizers etc, so there's a strong multiplicative factor 
> involved, which in the end winds up holding about 10GB of memory for 
> those 30 objects.)
>
>
>
> The question: Why does the number of FieldInvertState objects grow 
> over time during indexing?  Are these associated in some way with 
> segments?  Is this expected behavior?
>
>
>
> Thanks!
>
> Karl
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message