lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: ThreadLocal causing memory leak with J2EE applications
Date Thu, 11 Sep 2008 15:09:39 GMT

Yeah I think that's the right approach.

I'll code it up.

Mike

robert engels wrote:

> I think that would work, but I think you would be better off  
> encapsulating that in an extended ThreadLocal, e.g. WeakThreadLocal,  
> and use that every where. Add a method clear(), that clears the  
> ThreadLocals list (which will allow the values to be GC'd).
>
>
> On Sep 11, 2008, at 9:43 AM, Michael McCandless wrote:
>
>>
>> OK so we compact the list (removing dead threads) every time we add  
>> a new entry to the list.  This way for a long lived SegmentReader  
>> but short lived threads, the list keeps only live threads.
>>
>> We do need sync access to the list, but that's only on binding a  
>> new thread.  Retrieving an existing thread has no sync.
>>
>> Mike
>>
>> robert engels wrote:
>>
>>> You still need to sync access to the list, and how would it be  
>>> removed from the list prior to close? That is you need one per  
>>> thread, but you can have the reader shared across all threads. So  
>>> if threads were created and destroyed without ever closing the  
>>> reader, the list would grow unbounded.
>>>
>>> On Sep 11, 2008, at 9:20 AM, Michael McCandless wrote:
>>>
>>>>
>>>> I don't need it by thread, because I would still use ThreadLocal  
>>>> to retrieve the SegmentTermEnum.  This avoids any sync during get.
>>>>
>>>> The list is just a "fallback" to hold a hard reference to the  
>>>> SegmentTermEnum to keep it alive.  That's it's only purpose.   
>>>> Then, when SegmentReader is closed this list is cleared and GC is  
>>>> free to reclaim all SegmentTermEnums.
>>>>
>>>> Mike
>>>>
>>>> robert engels wrote:
>>>>
>>>>> But you need it by thread, so it can't be a list.
>>>>>
>>>>> You could have a HashMap of <Thread,ThreadState> in  
>>>>> FieldsReader, and when SegmentReader is closed, FieldsReader is  
>>>>> closed, which clears the map, and not use thread locals at all.  
>>>>> The difference being you would need a sync'd map.
>>>>>
>>>>> On Sep 11, 2008, at 4:56 AM, Michael McCandless wrote:
>>>>>
>>>>>>
>>>>>> What if we wrap the value in a WeakReference, but secondarily  
>>>>>> hold a hard reference to it in a "normal" list?
>>>>>>
>>>>>> Then, when TermInfosReader is closed we clear that list of all  
>>>>>> its hard references, at which point GC will be free to reclaim  
>>>>>> the object out from under the ThreadLocal even before the  
>>>>>> ThreadLocal purges its stale entries.
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>> robert engels wrote:
>>>>>>
>>>>>>> You can't hold the ThreadLocal value in a WeakReference,  
>>>>>>> because there is no hard reference between enumeration calls
 
>>>>>>> (so it would be cleared out from under you while enumerating).
>>>>>>>
>>>>>>> All of this occurs because you have some objects (readers/ 
>>>>>>> segments etc.) that are shared across all threads, but these
 
>>>>>>> contain objects that are 'thread/search state' specific. These
 
>>>>>>> latter objects are essentially "cached" for performance (so 

>>>>>>> you don't need to seek and read, sequential buffer access, etc.)
>>>>>>>
>>>>>>> A sometimes better solution is to have the state returned to
 
>>>>>>> the caller, and require the caller to pass/use the state later
 
>>>>>>> - then you don't need thread locals.
>>>>>>>
>>>>>>> You can accomplish a similar solution by returning a  
>>>>>>> "SessionKey" object, and have the caller pass this later.  You
 
>>>>>>> can then have a WeakHashMap of SessionKey,SearchState that the
 
>>>>>>> code can use.  When the SessionKey is destroyed (no longer  
>>>>>>> referenced), the state map can be cleaned up automatically.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sep 10, 2008, at 11:30 PM, Noble Paul നോബിള്‍
 
>>>>>>> नोब्ळ् wrote:
>>>>>>>
>>>>>>>> When I look at the reference tree That is the feeling I get.
 
>>>>>>>> if you
>>>>>>>> held a WeakReference it would get released .
>>>>>>>> |- base of org.apache.lucene.index.CompoundFileReader 
>>>>>>>> $CSIndexInput
>>>>>>>>           |- input of org.apache.lucene.index.SegmentTermEnum
>>>>>>>>               |- value of java.lang.ThreadLocal$ThreadLocalMap

>>>>>>>> $Entry
>>>>>>>>
>>>>>>>> On Wed, Sep 10, 2008 at 8:39 PM, Chris Lu  
>>>>>>>> <chris.lu@gmail.com> wrote:
>>>>>>>>> Does this make any difference?
>>>>>>>>> If I intentionally close the searcher and reader failed
to  
>>>>>>>>> release the
>>>>>>>>> memory, I can not rely on some magic of JVM to release
it.
>>>>>>>>> --
>>>>>>>>> Chris Lu
>>>>>>>>> -------------------------
>>>>>>>>> Instant Scalable Full-Text Search On Any Database/Application
>>>>>>>>> site: http://www.dbsight.net
>>>>>>>>> demo: http://search.dbsight.com
>>>>>>>>> Lucene Database Search in 3 minutes:
>>>>>>>>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>>>>>>>>> DBSight customer, a shopping comparison site, (anonymous
per  
>>>>>>>>> request) got
>>>>>>>>> 2.6 Million Euro funding!
>>>>>>>>>
>>>>>>>>> On Wed, Sep 10, 2008 at 4:03 AM, Noble Paul  
>>>>>>>>> നോബിള്‍ नोब्ळ्
>>>>>>>>> <noble.paul@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Why do you need to keep a strong reference?
>>>>>>>>>> Why not a WeakReference ?
>>>>>>>>>>
>>>>>>>>>> --Noble
>>>>>>>>>>
>>>>>>>>>> On Wed, Sep 10, 2008 at 12:27 AM, Chris Lu <chris.lu@gmail.com

>>>>>>>>>> > wrote:
>>>>>>>>>>> The problem should be similar to what's talked
about on  
>>>>>>>>>>> this discussion.
>>>>>>>>>>> http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal
>>>>>>>>>>>
>>>>>>>>>>> There is a memory leak for Lucene search from
Lucene-1195. 
>>>>>>>>>>> (svn r659602,
>>>>>>>>>>> May23,2008)
>>>>>>>>>>>
>>>>>>>>>>> This patch brings in a ThreadLocal cache to TermInfosReader.
>>>>>>>>>>>
>>>>>>>>>>> It's usually recommended to keep the reader open,
and  
>>>>>>>>>>> reuse it when
>>>>>>>>>>> possible. In a common J2EE application, the http
requests  
>>>>>>>>>>> are usually
>>>>>>>>>>> handled by different threads. But since the cache
is  
>>>>>>>>>>> ThreadLocal, the
>>>>>>>>>>> cache
>>>>>>>>>>> are not really usable by other threads. What's
worse, the  
>>>>>>>>>>> cache can not
>>>>>>>>>>> be
>>>>>>>>>>> cleared by another thread!
>>>>>>>>>>>
>>>>>>>>>>> This leak is not so obvious usually. But my case
is using  
>>>>>>>>>>> RAMDirectory,
>>>>>>>>>>> having several hundred megabytes. So one un-released
 
>>>>>>>>>>> resource is obvious
>>>>>>>>>>> to
>>>>>>>>>>> me.
>>>>>>>>>>>
>>>>>>>>>>> Here is the reference tree:
>>>>>>>>>>> org.apache.lucene.store.RAMDirectory
>>>>>>>>>>> |- directory of org.apache.lucene.store.RAMFile
>>>>>>>>>>>  |- file of org.apache.lucene.store.RAMInputStream
>>>>>>>>>>>      |- base of
>>>>>>>>>>> org.apache.lucene.index.CompoundFileReader$CSIndexInput
>>>>>>>>>>>          |- input of org.apache.lucene.index.SegmentTermEnum
>>>>>>>>>>>              |- value of java.lang.ThreadLocal

>>>>>>>>>>> $ThreadLocalMap$Entry
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> After I switched back to svn revision 659601,
right before  
>>>>>>>>>>> this patch is
>>>>>>>>>>> checked in, the memory leak is gone.
>>>>>>>>>>> Although my case is RAMDirectory, I believe this
will  
>>>>>>>>>>> affect disk based
>>>>>>>>>>> index also.
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Chris Lu
>>>>>>>>>>> -------------------------
>>>>>>>>>>> Instant Scalable Full-Text Search On Any Database/

>>>>>>>>>>> Application
>>>>>>>>>>> site: http://www.dbsight.net
>>>>>>>>>>> demo: http://search.dbsight.com
>>>>>>>>>>> Lucene Database Search in 3 minutes:
>>>>>>>>>>>
>>>>>>>>>>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>>>>>>>>>>> DBSight customer, a shopping comparison site,
(anonymous  
>>>>>>>>>>> per request)
>>>>>>>>>>> got
>>>>>>>>>>> 2.6 Million Euro funding!
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> --Noble Paul
>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: java-dev- 
>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> --Noble Paul
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message