lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert engels <reng...@ix.netcom.com>
Subject Re: ThreadLocal causing memory leak with J2EE applications
Date Thu, 11 Sep 2008 15:02:39 GMT
Technically, you need to sync on the set as well, since you need to  
remove the old value, and add the new to the list. Although Lucene  
doesn't use the set. just the initial value set, so the overhead is  
minimal.

On Sep 11, 2008, at 9:43 AM, Michael McCandless wrote:

>
> OK so we compact the list (removing dead threads) every time we add  
> a new entry to the list.  This way for a long lived SegmentReader  
> but short lived threads, the list keeps only live threads.
>
> We do need sync access to the list, but that's only on binding a  
> new thread.  Retrieving an existing thread has no sync.
>
> Mike
>
> robert engels wrote:
>
>> You still need to sync access to the list, and how would it be  
>> removed from the list prior to close? That is you need one per  
>> thread, but you can have the reader shared across all threads. So  
>> if threads were created and destroyed without ever closing the  
>> reader, the list would grow unbounded.
>>
>> On Sep 11, 2008, at 9:20 AM, Michael McCandless wrote:
>>
>>>
>>> I don't need it by thread, because I would still use ThreadLocal  
>>> to retrieve the SegmentTermEnum.  This avoids any sync during get.
>>>
>>> The list is just a "fallback" to hold a hard reference to the  
>>> SegmentTermEnum to keep it alive.  That's it's only purpose.   
>>> Then, when SegmentReader is closed this list is cleared and GC is  
>>> free to reclaim all SegmentTermEnums.
>>>
>>> Mike
>>>
>>> robert engels wrote:
>>>
>>>> But you need it by thread, so it can't be a list.
>>>>
>>>> You could have a HashMap of <Thread,ThreadState> in  
>>>> FieldsReader, and when SegmentReader is closed, FieldsReader is  
>>>> closed, which clears the map, and not use thread locals at all.  
>>>> The difference being you would need a sync'd map.
>>>>
>>>> On Sep 11, 2008, at 4:56 AM, Michael McCandless wrote:
>>>>
>>>>>
>>>>> What if we wrap the value in a WeakReference, but secondarily  
>>>>> hold a hard reference to it in a "normal" list?
>>>>>
>>>>> Then, when TermInfosReader is closed we clear that list of all  
>>>>> its hard references, at which point GC will be free to reclaim  
>>>>> the object out from under the ThreadLocal even before the  
>>>>> ThreadLocal purges its stale entries.
>>>>>
>>>>> Mike
>>>>>
>>>>> robert engels wrote:
>>>>>
>>>>>> You can't hold the ThreadLocal value in a WeakReference,  
>>>>>> because there is no hard reference between enumeration calls  
>>>>>> (so it would be cleared out from under you while enumerating).
>>>>>>
>>>>>> All of this occurs because you have some objects (readers/ 
>>>>>> segments etc.) that are shared across all threads, but these  
>>>>>> contain objects that are 'thread/search state' specific. These  
>>>>>> latter objects are essentially "cached" for performance (so  
>>>>>> you don't need to seek and read, sequential buffer access, etc.)
>>>>>>
>>>>>> A sometimes better solution is to have the state returned to  
>>>>>> the caller, and require the caller to pass/use the state later  
>>>>>> - then you don't need thread locals.
>>>>>>
>>>>>> You can accomplish a similar solution by returning a  
>>>>>> "SessionKey" object, and have the caller pass this later.  You  
>>>>>> can then have a WeakHashMap of SessionKey,SearchState that the  
>>>>>> code can use.  When the SessionKey is destroyed (no longer  
>>>>>> referenced), the state map can be cleaned up automatically.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sep 10, 2008, at 11:30 PM, Noble Paul  
>>>>>> നോബിള്‍ नोब्ळ् wrote:
>>>>>>
>>>>>>> When I look at the reference tree That is the feeling I get.
 
>>>>>>> if you
>>>>>>> held a WeakReference it would get released .
>>>>>>> |- base of org.apache.lucene.index.CompoundFileReader 
>>>>>>> $CSIndexInput
>>>>>>>            |- input of org.apache.lucene.index.SegmentTermEnum
>>>>>>>                |- value of java.lang.ThreadLocal 
>>>>>>> $ThreadLocalMap$Entry
>>>>>>>
>>>>>>> On Wed, Sep 10, 2008 at 8:39 PM, Chris Lu  
>>>>>>> <chris.lu@gmail.com> wrote:
>>>>>>>> Does this make any difference?
>>>>>>>> If I intentionally close the searcher and reader failed to
 
>>>>>>>> release the
>>>>>>>> memory, I can not rely on some magic of JVM to release it.
>>>>>>>> --
>>>>>>>> Chris Lu
>>>>>>>> -------------------------
>>>>>>>> Instant Scalable Full-Text Search On Any Database/Application
>>>>>>>> site: http://www.dbsight.net
>>>>>>>> demo: http://search.dbsight.com
>>>>>>>> Lucene Database Search in 3 minutes:
>>>>>>>> http://wiki.dbsight.com/index.php? 
>>>>>>>> title=Create_Lucene_Database_Search_in_3_minutes
>>>>>>>> DBSight customer, a shopping comparison site, (anonymous
per  
>>>>>>>> request) got
>>>>>>>> 2.6 Million Euro funding!
>>>>>>>>
>>>>>>>> On Wed, Sep 10, 2008 at 4:03 AM, Noble Paul  
>>>>>>>> നോബിള്‍ नोब्ळ्
>>>>>>>> <noble.paul@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Why do you need to keep a strong reference?
>>>>>>>>> Why not a WeakReference ?
>>>>>>>>>
>>>>>>>>> --Noble
>>>>>>>>>
>>>>>>>>> On Wed, Sep 10, 2008 at 12:27 AM, Chris Lu  
>>>>>>>>> <chris.lu@gmail.com> wrote:
>>>>>>>>>> The problem should be similar to what's talked about
on  
>>>>>>>>>> this discussion.
>>>>>>>>>> http://lucene.markmail.org/message/keosgz2c2yjc7qre?

>>>>>>>>>> q=ThreadLocal
>>>>>>>>>>
>>>>>>>>>> There is a memory leak for Lucene search from Lucene-1195.

>>>>>>>>>> (svn r659602,
>>>>>>>>>> May23,2008)
>>>>>>>>>>
>>>>>>>>>> This patch brings in a ThreadLocal cache to TermInfosReader.
>>>>>>>>>>
>>>>>>>>>> It's usually recommended to keep the reader open,
and  
>>>>>>>>>> reuse it when
>>>>>>>>>> possible. In a common J2EE application, the http
requests  
>>>>>>>>>> are usually
>>>>>>>>>> handled by different threads. But since the cache
is  
>>>>>>>>>> ThreadLocal, the
>>>>>>>>>> cache
>>>>>>>>>> are not really usable by other threads. What's worse,
the  
>>>>>>>>>> cache can not
>>>>>>>>>> be
>>>>>>>>>> cleared by another thread!
>>>>>>>>>>
>>>>>>>>>> This leak is not so obvious usually. But my case
is using  
>>>>>>>>>> RAMDirectory,
>>>>>>>>>> having several hundred megabytes. So one un-released
 
>>>>>>>>>> resource is obvious
>>>>>>>>>> to
>>>>>>>>>> me.
>>>>>>>>>>
>>>>>>>>>> Here is the reference tree:
>>>>>>>>>> org.apache.lucene.store.RAMDirectory
>>>>>>>>>> |- directory of org.apache.lucene.store.RAMFile
>>>>>>>>>>   |- file of org.apache.lucene.store.RAMInputStream
>>>>>>>>>>       |- base of
>>>>>>>>>> org.apache.lucene.index.CompoundFileReader$CSIndexInput
>>>>>>>>>>           |- input of org.apache.lucene.index.SegmentTermEnum
>>>>>>>>>>               |- value of java.lang.ThreadLocal 
>>>>>>>>>> $ThreadLocalMap$Entry
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> After I switched back to svn revision 659601, right
before  
>>>>>>>>>> this patch is
>>>>>>>>>> checked in, the memory leak is gone.
>>>>>>>>>> Although my case is RAMDirectory, I believe this
will  
>>>>>>>>>> affect disk based
>>>>>>>>>> index also.
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Chris Lu
>>>>>>>>>> -------------------------
>>>>>>>>>> Instant Scalable Full-Text Search On Any Database/Application
>>>>>>>>>> site: http://www.dbsight.net
>>>>>>>>>> demo: http://search.dbsight.com
>>>>>>>>>> Lucene Database Search in 3 minutes:
>>>>>>>>>>
>>>>>>>>>> http://wiki.dbsight.com/index.php? 
>>>>>>>>>> title=Create_Lucene_Database_Search_in_3_minutes
>>>>>>>>>> DBSight customer, a shopping comparison site, (anonymous
 
>>>>>>>>>> per request)
>>>>>>>>>> got
>>>>>>>>>> 2.6 Million Euro funding!
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> --Noble Paul
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------

>>>>>>>>> -------
>>>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>>>> For additional commands, e-mail: java-dev- 
>>>>>>>>> help@lucene.apache.org
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -- 
>>>>>>> --Noble Paul
>>>>>>
>>>>>>
>>>>>> -----------------------------------------------------------------

>>>>>> ----
>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> ---
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message