lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Lu <chris...@gmail.com>
Subject Re: ThreadLocal causing memory leak with J2EE applications
Date Sun, 14 Sep 2008 03:11:34 GMT
Just confirmed the fix for this problem is ready in patch LUCENE-1383

Thanks Robert Engels for arguing with me and understand the problem 
quickly, and contributed a ClosableThreadLocal class, although the 
problem itself is hard to reproduce for him, and thanks Michael 
McCandless for fixing the problem soooo quickly.

-- 
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: 
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) 
got 2.6 Million Euro funding!

Michael McCandless wrote:
>
> Yeah I think that's the right approach.
>
> I'll code it up.
>
> Mike
>
> robert engels wrote:
>
>> I think that would work, but I think you would be better off 
>> encapsulating that in an extended ThreadLocal, e.g. WeakThreadLocal, 
>> and use that every where. Add a method clear(), that clears the 
>> ThreadLocals list (which will allow the values to be GC'd).
>>
>>
>> On Sep 11, 2008, at 9:43 AM, Michael McCandless wrote:
>>
>>>
>>> OK so we compact the list (removing dead threads) every time we add 
>>> a new entry to the list.  This way for a long lived SegmentReader 
>>> but short lived threads, the list keeps only live threads.
>>>
>>> We do need sync access to the list, but that's only on binding a new 
>>> thread.  Retrieving an existing thread has no sync.
>>>
>>> Mike
>>>
>>> robert engels wrote:
>>>
>>>> You still need to sync access to the list, and how would it be 
>>>> removed from the list prior to close? That is you need one per 
>>>> thread, but you can have the reader shared across all threads. So 
>>>> if threads were created and destroyed without ever closing the 
>>>> reader, the list would grow unbounded.
>>>>
>>>> On Sep 11, 2008, at 9:20 AM, Michael McCandless wrote:
>>>>
>>>>>
>>>>> I don't need it by thread, because I would still use ThreadLocal 
>>>>> to retrieve the SegmentTermEnum.  This avoids any sync during get.
>>>>>
>>>>> The list is just a "fallback" to hold a hard reference to the 
>>>>> SegmentTermEnum to keep it alive.  That's it's only purpose.  
>>>>> Then, when SegmentReader is closed this list is cleared and GC is 
>>>>> free to reclaim all SegmentTermEnums.
>>>>>
>>>>> Mike
>>>>>
>>>>> robert engels wrote:
>>>>>
>>>>>> But you need it by thread, so it can't be a list.
>>>>>>
>>>>>> You could have a HashMap of <Thread,ThreadState> in FieldsReader,

>>>>>> and when SegmentReader is closed, FieldsReader is closed, which 
>>>>>> clears the map, and not use thread locals at all. The difference

>>>>>> being you would need a sync'd map.
>>>>>>
>>>>>> On Sep 11, 2008, at 4:56 AM, Michael McCandless wrote:
>>>>>>
>>>>>>>
>>>>>>> What if we wrap the value in a WeakReference, but secondarily

>>>>>>> hold a hard reference to it in a "normal" list?
>>>>>>>
>>>>>>> Then, when TermInfosReader is closed we clear that list of all

>>>>>>> its hard references, at which point GC will be free to reclaim

>>>>>>> the object out from under the ThreadLocal even before the 
>>>>>>> ThreadLocal purges its stale entries.
>>>>>>>
>>>>>>> Mike
>>>>>>>
>>>>>>> robert engels wrote:
>>>>>>>
>>>>>>>> You can't hold the ThreadLocal value in a WeakReference,

>>>>>>>> because there is no hard reference between enumeration calls

>>>>>>>> (so it would be cleared out from under you while enumerating).
>>>>>>>>
>>>>>>>> All of this occurs because you have some objects 
>>>>>>>> (readers/segments etc.) that are shared across all threads,
but 
>>>>>>>> these contain objects that are 'thread/search state' specific.

>>>>>>>> These latter objects are essentially "cached" for performance

>>>>>>>> (so you don't need to seek and read, sequential buffer access,

>>>>>>>> etc.)
>>>>>>>>
>>>>>>>> A sometimes better solution is to have the state returned
to 
>>>>>>>> the caller, and require the caller to pass/use the state
later 
>>>>>>>> - then you don't need thread locals.
>>>>>>>>
>>>>>>>> You can accomplish a similar solution by returning a 
>>>>>>>> "SessionKey" object, and have the caller pass this later.
 You 
>>>>>>>> can then have a WeakHashMap of SessionKey,SearchState that
the 
>>>>>>>> code can use.  When the SessionKey is destroyed (no longer

>>>>>>>> referenced), the state map can be cleaned up automatically.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sep 10, 2008, at 11:30 PM, Noble Paul നോബിള്‍
नोब्ळ् wrote:
>>>>>>>>
>>>>>>>>> When I look at the reference tree That is the feeling
I get. 
>>>>>>>>> if you
>>>>>>>>> held a WeakReference it would get released .
>>>>>>>>> |- base of 
>>>>>>>>> org.apache.lucene.index.CompoundFileReader$CSIndexInput
>>>>>>>>>           |- input of org.apache.lucene.index.SegmentTermEnum
>>>>>>>>>               |- value of 
>>>>>>>>> java.lang.ThreadLocal$ThreadLocalMap$Entry
>>>>>>>>>
>>>>>>>>> On Wed, Sep 10, 2008 at 8:39 PM, Chris Lu <chris.lu@gmail.com>

>>>>>>>>> wrote:
>>>>>>>>>> Does this make any difference?
>>>>>>>>>> If I intentionally close the searcher and reader
failed to 
>>>>>>>>>> release the
>>>>>>>>>> memory, I can not rely on some magic of JVM to release
it.
>>>>>>>>>> -- 
>>>>>>>>>> Chris Lu
>>>>>>>>>> -------------------------
>>>>>>>>>> Instant Scalable Full-Text Search On Any Database/Application
>>>>>>>>>> site: http://www.dbsight.net
>>>>>>>>>> demo: http://search.dbsight.com
>>>>>>>>>> Lucene Database Search in 3 minutes:
>>>>>>>>>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes

>>>>>>>>>>
>>>>>>>>>> DBSight customer, a shopping comparison site, (anonymous
per 
>>>>>>>>>> request) got
>>>>>>>>>> 2.6 Million Euro funding!
>>>>>>>>>>
>>>>>>>>>> On Wed, Sep 10, 2008 at 4:03 AM, Noble Paul നോബിള്‍
नोब्ळ्
>>>>>>>>>> <noble.paul@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Why do you need to keep a strong reference?
>>>>>>>>>>> Why not a WeakReference ?
>>>>>>>>>>>
>>>>>>>>>>> --Noble
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Sep 10, 2008 at 12:27 AM, Chris Lu 
>>>>>>>>>>> <chris.lu@gmail.com> wrote:
>>>>>>>>>>>> The problem should be similar to what's talked
about on 
>>>>>>>>>>>> this discussion.
>>>>>>>>>>>> http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal

>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> There is a memory leak for Lucene search
from 
>>>>>>>>>>>> Lucene-1195.(svn r659602,
>>>>>>>>>>>> May23,2008)
>>>>>>>>>>>>
>>>>>>>>>>>> This patch brings in a ThreadLocal cache
to TermInfosReader.
>>>>>>>>>>>>
>>>>>>>>>>>> It's usually recommended to keep the reader
open, and reuse 
>>>>>>>>>>>> it when
>>>>>>>>>>>> possible. In a common J2EE application, the
http requests 
>>>>>>>>>>>> are usually
>>>>>>>>>>>> handled by different threads. But since the
cache is 
>>>>>>>>>>>> ThreadLocal, the
>>>>>>>>>>>> cache
>>>>>>>>>>>> are not really usable by other threads. What's
worse, the 
>>>>>>>>>>>> cache can not
>>>>>>>>>>>> be
>>>>>>>>>>>> cleared by another thread!
>>>>>>>>>>>>
>>>>>>>>>>>> This leak is not so obvious usually. But
my case is using 
>>>>>>>>>>>> RAMDirectory,
>>>>>>>>>>>> having several hundred megabytes. So one
un-released 
>>>>>>>>>>>> resource is obvious
>>>>>>>>>>>> to
>>>>>>>>>>>> me.
>>>>>>>>>>>>
>>>>>>>>>>>> Here is the reference tree:
>>>>>>>>>>>> org.apache.lucene.store.RAMDirectory
>>>>>>>>>>>> |- directory of org.apache.lucene.store.RAMFile
>>>>>>>>>>>>  |- file of org.apache.lucene.store.RAMInputStream
>>>>>>>>>>>>      |- base of
>>>>>>>>>>>> org.apache.lucene.index.CompoundFileReader$CSIndexInput
>>>>>>>>>>>>          |- input of org.apache.lucene.index.SegmentTermEnum
>>>>>>>>>>>>              |- value of 
>>>>>>>>>>>> java.lang.ThreadLocal$ThreadLocalMap$Entry
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> After I switched back to svn revision 659601,
right before 
>>>>>>>>>>>> this patch is
>>>>>>>>>>>> checked in, the memory leak is gone.
>>>>>>>>>>>> Although my case is RAMDirectory, I believe
this will 
>>>>>>>>>>>> affect disk based
>>>>>>>>>>>> index also.
>>>>>>>>>>>>
>>>>>>>>>>>> -- 
>>>>>>>>>>>> Chris Lu
>>>>>>>>>>>> -------------------------
>>>>>>>>>>>> Instant Scalable Full-Text Search On Any
Database/Application
>>>>>>>>>>>> site: http://www.dbsight.net
>>>>>>>>>>>> demo: http://search.dbsight.com
>>>>>>>>>>>> Lucene Database Search in 3 minutes:
>>>>>>>>>>>>
>>>>>>>>>>>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes

>>>>>>>>>>>>
>>>>>>>>>>>> DBSight customer, a shopping comparison site,
(anonymous 
>>>>>>>>>>>> per request)
>>>>>>>>>>>> got
>>>>>>>>>>>> 2.6 Million Euro funding!
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -- 
>>>>>>>>>>> --Noble Paul
>>>>>>>>>>>
>>>>>>>>>>> ---------------------------------------------------------------------

>>>>>>>>>>>
>>>>>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>>>>>> For additional commands, e-mail: 
>>>>>>>>>>> java-dev-help@lucene.apache.org
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> --Noble Paul
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------

>>>>>>>>
>>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------

>>>>>>>
>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------

>>>>>>
>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


Mime
View raw message