lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert engels <>
Subject Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)
Date Mon, 18 Dec 2006 20:35:59 GMT
There is no inherent problem with ThreadLocal. It is a viable  
solution to synchronization issues in most cases.

On Dec 18, 2006, at 11:25 AM, Bernhard Messer wrote:

> Otis,
> i figured out a similar problem when running a very heavy loaded  
> search application in a servlet container. The reasone using  
> ThreadLocals was to get rid of synchronized method calls e.g in  
> TermVectorsReader which would break down the overall search  
> performance. Currently i do not see an easy solution to fix both,  
> the synchronization and ThreadLocal problem.
> Bernhard
> Otis Gospodnetic wrote:
>> Moving to java-dev, I think this belongs here.
>> I've been looking at this problem some more today and reading  
>> about ThreadLocals.  It's easy to misuse them and end up with  
>> memory leaks, apparently... and I think we may have this problem  
>> here.
>> The problem here is that ThreadLocals are tied to Threads, and I  
>> think the assumption in TermInfosReader and SegmentReader is that  
>> (search) Threads are short-lived: they come in, scan the index, do  
>> the search, return and die.  In this scenario, their ThreadLocals  
>> go to heaven with them, too, and memory is freed up.
>> But when Threads are long-lived, as they are in thread pools (e.g.  
>> those in servlet containers), those ThreadLocals stay alive even  
>> after a single search request is done.  Moreover, the Thread is  
>> reused, and the new TermInfosReader and SegmentReader put some new  
>> values in that ThreadLocal on top of the old values (I think) from  
>> the previous search request.  Because the Thread still has  
>> references to ThreadLocals and the values in them, the values  
>> never get GCed.
>> I tried making ThreadLocals in TIR and SR static, I tried wrapping  
>> values saved in TLs in WeakReference, I've tried using WeakHashMap  
>> like in Robert Engel's FixedThreadLocal class from LUCENE-436, but  
>> nothing helped.  I thought about adding a public static method to  
>> TIR and SR, so one could call it at the end of a search request  
>> (think servlet filter) and clear the TL for the current thread,  
>> but that would require making TIR and SR public and I'm not 100%  
>> sure if it would work, plus that exposes the implementation  
>> details too much.
>> I don't have a solution yet.
>> But do we *really* need ThreadLocal in TIR and SR?  The only thing  
>> that TL is doing there is acting as a per-thread storage of some  
>> cloned value (in TIR we clone SegmentTermEnum and in SR we clone  
>> TermVectorsReader).  Why can't we just store those cloned values  
>> in instance variables?  Isn't whoever is calling TIR and SR going  
>> to be calling the same instance of TIR and SR anyway, and thus get  
>> access to those cloned values?
>> I'm really amazed that we haven't heard any reports about this  
>> before.  I am not sure why my application started showing this  
>> leak only about 3 weeks ago.  It is getting more pounded on than  
>> before, so maybe that made the leak more obvious.  My guess is  
>> that more common Lucene usage is with a single index or a small  
>> number of them, and with short-lived threads, where this problem  
>> isn't easily visible.  In my case I deal with a few tens of  
>> thousands of indices and several parallel search threads that live  
>> forever in the thread pool.
>> Any thoughts about this or possible suggestions for a fix?
>> Thanks,
>> Otis
>> ----- Original Message ----
>> From: Otis Gospodnetic <>
>> To:
>> Sent: Friday, December 15, 2006 12:28:29 PM
>> Subject: Leaking org.apache.lucene.index.* objects
>> Hi,
>> About 2-3 weeks ago I emailed about a memory leak in my  
>> application.  I then found some problems in my code (I wasn't  
>> closing IndexSearchers explicitly) and took care of those.  Now I  
>> see my app is still leaking memory - jconsole clearly shows the  
>> "Tenured Gen" memory pool getting filled up until I hit the OOM,  
>> but I can't seem to pin-point the source.
>> I found that a bunch or o.a.l.index.* objects are not getting  
>> GCed, even though they should.  For example:
>> $ jmap -histo:live 7825 | grep apache.lucene.index | head -20 |  
>> sort -k2 -nr
>> num   #instances    #bytes  class name
>> --------------------------------------
>>   4:   1764840    98831040   
>> org.apache.lucene.index.CompoundFileReader$CSIndexInput
>>   5:   2119215    67814880  org.apache.lucene.index.TermInfo
>>   7:   1112459    35598688  org.apache.lucene.index.SegmentReader 
>> $Norm
>>   9:   2132311    34116976  org.apache.lucene.index.Term
>>  12:   1117897    26829528  org.apache.lucene.index.FieldInfo
>>  13:    225340    18027200  org.apache.lucene.index.SegmentTermEnum
>>  15:    589727    14153448  org.apache.lucene.index.TermBuffer
>>  21:     86033     8718504  [Lorg.apache.lucene.index.TermInfo;
>>  20:     86033     8718504  [Lorg.apache.lucene.index.Term;
>>  23:     86120     7578560  org.apache.lucene.index.SegmentReader
>>  26:     90501     5068056
>>  27:     86120     4822720  org.apache.lucene.index.TermInfosReader
>>  33:     86130     3445200  org.apache.lucene.index.SegmentInfo
>>  36:     87355     2795360 
>> $Descriptor
>>  38:     86120     2755840  org.apache.lucene.index.FieldsReader
>>  39:     86050     2753600   
>> org.apache.lucene.index.CompoundFileReader
>>  42:     46903     2251344  org.apache.lucene.index.SegmentInfos
>>  43:     93778     2250672 
>> $Entry
>>  45:     93778     1500448 
>> $CreationPlaceholder
>>  47:     86510     1384160  org.apache.lucene.index.FieldInfos
>> I'm running my app in search-only mode - no adds or deletes.
>> The counts of these objects just keeps going up, even though I am  
>> explicitly closing the IndexSearcher.  I can see that file  
>> descriptors _are_ freed up after searcher.close(), because lsof no  
>> longer shows them, but the above objects just linger and  
>> accumulate, even when I force GC via jconsole or via the profiler.
>> I thought maybe various *Readers are not getting close()d, but  
>> I've double-checked all *Readers above, and they all seem to close  
>> their IndexInput references.  The static nested class  
>> CompoundFileReader.CSIndexInput has a close() without any  
>> implementation.  At first I thought that was an omission, but  
>> adding a close of the inner IndexInput there resulted in a search- 
>> time error.  I've added the lovely print debugging to various close 
>> () methods and see those methods being called.  I've added finalize 
>> () with some print debugging to SegmentReader, TermInfosReader,  
>> SegmentTermEnum, FieldsReader, and CompoundFileReader.  All but  
>> CFReader get finalized after a while.
>> My application is running as a webapp and has thousands of  
>> separate indices.  This means it's very multi-threaded and the  
>> servlet container has a pool of threads that handle requests, and  
>> each request may be for a different index.  I cache IndexSearchers  
>> for a while, and purge/close them every N minutes if they have  
>> been idle more than M minutes.
>> It occurred to me last night that things like TermInfosReader and  
>> SegmentReader are using ThreadLocal, and since threads are used in  
>> a thread pool, and thus shared with requests handling searches  
>> against different indices, it's not clear to me what happens with  
>> object instances that are put in those ThreadLocals in such  
>> scenario.  Aren't things going to step on each others' toes?
>> TIR has close() and SR has doClose(), so I put <TL inst>.set(null)  
>> there.  This immediately got rid of those instances of  
>> CompoundFileReader.CSIndexInput in my dev environment!!!! Yeeees!
>> But in my dev environment I tested my additions by slamming my app  
>> against a *single* index.  I took my modified Lucene to  
>> production, and quickly saw all those o.a.l.index.* objects  
>> accumulate again.  I also see a lot of ThreadLocal's kids:
>>  16:    419387    13420384  java.lang.ThreadLocal$ThreadLocalMap 
>> $Entry
>> I *think* that points out to some issues with how that ThreadLocal  
>> is used there, in a multi-threaded, multi-index environments.
>> I'm running JDK 6, and while this problem sounds a bit like  
>> LUCENE-436, I'm not yet sure if it's the same thing.  Because my  
>> IndexSearchers (and thus all those o.a.l.index.* objects) are long- 
>> lived, and threads are shared and reused for searching of other  
>> indices, those close() and doClose() methods are not called at the  
>> end of the request life-cycle, so at the end of the request those  
>> TL instances will *still* have something in them.  When their  
>> thread is later reused for searching of another index, new data  
>> will be put in them, but the old data will never be cleaned out!  No?
>> It seems a bit odd, but with this ThreadLocals, shouldn't a multi- 
>> threaded, multi-index webapp really have to "clean" those  
>> ThreadLocal instances either before or at the end of the request?
>> I'm running out of ideas, and was wondering if anyone has any  
>> thoughts about what could still be holding references to the above  
>> classes.  I have some 20-30MB memory snapshots (via YourKit) and  
>> heap dumps (via jmap), if anyone is interested.
>> Thanks,
>> Otis
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message