lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <karl.wri...@nokia.com>
Subject RE: Lucene 4.0 memory usage during indexing - is this expected?
Date Wed, 03 Oct 2012 18:12:19 GMT
Mystery resolved; the problem was due to an ever-increasing record size, which was in turn
due to a record structure that was never being cleared.  This caused it to appear as if the
total allocation of structures used for analysis was steadily growing.  But the number of
such entities did NOT grow, which is what gave away the solution.

Thanks for the hints, and sorry for the confusion.

Karl

-----Original Message-----
From: Wright Karl (Nokia-LC/Boston) 
Sent: Wednesday, October 03, 2012 12:41 PM
To: dev@lucene.apache.org
Subject: RE: Lucene 4.0 memory usage during indexing - is this expected?

Threads are managed via an executor service and are a fixed size thread pool, of size 16 on
this machine.

There are not a lot of fields in the schema (a half dozen).  We do use PerFieldAnalyzerWrapper.

I'm still grappling with the mat reports; it's possible of course that we're holding onto
something unexpected, or even that we have a fragmentation situation.  Stay tuned.

Karl

-----Original Message-----
From: ext Michael McCandless [mailto:lucene@mikemccandless.com]
Sent: Wednesday, October 03, 2012 11:50 AM
To: dev@lucene.apache.org
Subject: Re: Lucene 4.0 memory usage during indexing - is this expected?

I wish I could remember/find the Jira issue here ... there was one fairly recently.

Are you really sure your not turning over threads that are coming through Lucene...?  High
thread turnover causes challenges for ThreadLocals ...

Do you have a lot of fields?  Are you using PerFieldAnalyzerWrapper...?

Mike McCandless

http://blog.mikemccandless.com

On Wed, Oct 3, 2012 at 10:45 AM,  <karl.wright@nokia.com> wrote:
> There's a fixed-sized thread pool involved in doing the indexing, of a size that depends
on the machine parameters.
> Karl
>
> -----Original Message-----
> From: ext Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Wednesday, October 03, 2012 10:43 AM
> To: Wright Karl (Nokia-LC/Boston)
> Subject: Re: Lucene 4.0 memory usage during indexing - is this expected?
>
> This is no good!
>
> Can you send an email to dev@?  This sounds very familiar ... and I had thought we committed
a fix for it ... hopefully Uwe or Robert can remember what it was!
>
> Do you create new threads frequently, to do indexing?  Rather than pulling from a fixed
pool?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Wed, Oct 3, 2012 at 8:32 AM,  <karl.wright@nokia.com> wrote:
>> Hi Mike,
>>
>>
>>
>> I've got a technical question for you.
>>
>>
>>
>> For background, we've been building a new address search engine on 
>> top of Lucene 4.0.  The main customization involves a chain of custom 
>> analyzers etc, and it all works quite well.  Or at least it did until 
>> I added 7m more documents to the list.  At that point the indexing 
>> process began to run out of memory, even though we were giving it 
>> some 20GB.  Only some 12GB of that is accounted for in our part of the world.
>>
>>
>>
>> Looking at an eclipse MAT dump, the main thing that still seems to 
>> grow over time is/are TokenStreamComponent objects that are being 
>> held indirectly by org.apache.lucene.index.FieldInvertState objects.
>> The number of FieldInvertState objects grows and grows.  By the 
>> middle of the indexing process, there are 30 of these, and each one 
>> of these seems to hold onto one TokenStreamComponent per field.
>> (Each TokenStreamComponent in turn holds onto a whole pile of things 
>> like ICU tokenizers etc, so there's a strong multiplicative factor 
>> involved, which in the end winds up holding about 10GB of memory for 
>> those 30 objects.)
>>
>>
>>
>> The question: Why does the number of FieldInvertState objects grow 
>> over time during indexing?  Are these associated in some way with 
>> segments?  Is this expected behavior?
>>
>>
>>
>> Thanks!
>>
>> Karl
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For 
> additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail:
dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message