lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert engels <reng...@ix.netcom.com>
Subject Re: ThreadLocal causing memory leak with J2EE applications
Date Wed, 10 Sep 2008 05:46:10 GMT
As a follow-up, the SegmentTermEnum does contain an IndexInput and  
based on your configuration (buffer sizes, eg) this could be a large  
object, so you do need to be careful !

On Sep 10, 2008, at 12:14 AM, robert engels wrote:

> A searcher uses an IndexReader - the IndexReader is slow to open,  
> not a Searcher. And searchers can share an IndexReader.
>
> You want to create a single shared (across all threads/users)  
> IndexReader (usually), and create an Searcher as needed and  
> dispose.  It is VERY CHEAP to create the Searcher.
>
> I am fairly certain the javadoc on Searcher is incorrect.  The  
> warning "For performance reasons it is recommended to open only one  
> IndexSearcher and use it for all of your searches" is not true in  
> the case where an IndexReader is passed to the ctor.
>
> Any caching should USUALLY be performed at the IndexReader level.
>
> You are most likely using the "path" ctor, and that is the source  
> of your problems, as multiple IndexReader instances are being  
> created, and thus the memory use.
>
>
> On Sep 9, 2008, at 11:44 PM, Chris Lu wrote:
>
>> On J2EE environment, usually there is a searcher pool with several  
>> searchers open.
>> The speed to opening a large index for every user is not acceptable.
>>
>> -- 
>> Chris Lu
>> -------------------------
>> Instant Scalable Full-Text Search On Any Database/Application
>> site: http://www.dbsight.net
>> demo: http://search.dbsight.com
>> Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
>> index.php?title=Create_Lucene_Database_Search_in_3_minutes
>> DBSight customer, a shopping comparison site, (anonymous per  
>> request) got 2.6 Million Euro funding!
>>
>> On Tue, Sep 9, 2008 at 9:03 PM, robert engels  
>> <rengels@ix.netcom.com> wrote:
>> You need to close the searcher within the thread that is using it,  
>> in order to have it cleaned up quickly... usually right after you  
>> display the page of results.
>>
>> If you are keeping multiple searcher refs across multiple threads  
>> for paging/whatever, you have not coded it correctly.
>>
>> Imagine 10,000 users - storing a searcher for each one is not  
>> going to work...
>>
>> On Sep 9, 2008, at 10:21 PM, Chris Lu wrote:
>>
>>> Right, in a sense I can not release it from another thread. But  
>>> that's the problem.
>>>
>>> It's a J2EE environment, all threads are kind of equal. It's  
>>> simply not possible to iterate through all threads to close the  
>>> searcher, thus releasing the ThreadLocal cache.
>>> Unless Lucene is not recommended for J2EE environment, this has  
>>> to be fixed.
>>>
>>> -- 
>>> Chris Lu
>>> -------------------------
>>> Instant Scalable Full-Text Search On Any Database/Application
>>> site: http://www.dbsight.net
>>> demo: http://search.dbsight.com
>>> Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
>>> index.php?title=Create_Lucene_Database_Search_in_3_minutes
>>> DBSight customer, a shopping comparison site, (anonymous per  
>>> request) got 2.6 Million Euro funding!
>>>
>>>
>>> On Tue, Sep 9, 2008 at 8:14 PM, robert engels  
>>> <rengels@ix.netcom.com> wrote:
>>> Your code is not correct. You cannot release it on another thread  
>>> - the first thread may creating hundreds/thousands of instances  
>>> before the other thread ever runs...
>>>
>>> On Sep 9, 2008, at 10:10 PM, Chris Lu wrote:
>>>
>>>> If I release it on the thread that's creating the searcher, by  
>>>> setting searcher=null, everything is fine, the memory is  
>>>> released very cleanly.
>>>> My load test was to repeatedly create a searcher on a  
>>>> RAMDirectory and release it on another thread. The test will  
>>>> quickly go to OOM after several runs. I set the heap size to be  
>>>> 1024M, and the RAMDirectory is of size 250M. Using some  
>>>> profiling tool, the used size simply stepped up pretty obviously  
>>>> by 250M.
>>>>
>>>> I think we should not rely on something that's a "maybe"  
>>>> behavior, especially for a general purpose library.
>>>>
>>>> Since it's a multi-threaded env, the thread that's creating the  
>>>> entries in the LRU cache may not go away quickly(actually most,  
>>>> if not all, application servers will try to reuse threads), so  
>>>> the LRU cache, which uses thread as the key, can not be  
>>>> released, so the SegmentTermEnum which is in the same class can  
>>>> not be released.
>>>>
>>>> And yes, I close the RAMDirectory, and the fileMap is released.  
>>>> I verified that through the profiler by directly checking the  
>>>> values in the snapshot.
>>>>
>>>> Pretty sure the reference tree wasn't like this using code  
>>>> before this commit, because after close the searcher in another  
>>>> thread, the RAMDirectory totally disappeared from the memory  
>>>> snapshot.
>>>>
>>>> -- 
>>>> Chris Lu
>>>> -------------------------
>>>> Instant Scalable Full-Text Search On Any Database/Application
>>>> site: http://www.dbsight.net
>>>> demo: http://search.dbsight.com
>>>> Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
>>>> index.php?title=Create_Lucene_Database_Search_in_3_minutes
>>>> DBSight customer, a shopping comparison site, (anonymous per  
>>>> request) got 2.6 Million Euro funding!
>>>>
>>>> On Tue, Sep 9, 2008 at 5:03 PM, Michael McCandless  
>>>> <lucene@mikemccandless.com> wrote:
>>>>
>>>> Chris Lu wrote:
>>>>
>>>> The problem should be similar to what's talked about on this  
>>>> discussion.
>>>> http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal
>>>>
>>>> The "rough" conclusion of that thread is that, technically, this  
>>>> isn't a memory leak but rather a "delayed freeing" problem.  Ie,  
>>>> it may take longer, possibly much longer, than you want for the  
>>>> memory to be freed.
>>>>
>>>>
>>>> There is a memory leak for Lucene search from Lucene-1195.(svn  
>>>> r659602, May23,2008)
>>>>
>>>> This patch brings in a ThreadLocal cache to TermInfosReader.
>>>>
>>>> One thing that confuses me: TermInfosReader was already using a  
>>>> ThreadLocal to cache the SegmentTermEnum instance.  What was  
>>>> added in this commit (for LUCENE-1195) was an LRU cache storing  
>>>> Term -> TermInfo instances.  But it seems like it's the  
>>>> SegmentTermEnum instance that you're tracing below.
>>>>
>>>>
>>>> It's usually recommended to keep the reader open, and reuse it when
>>>> possible. In a common J2EE application, the http requests are  
>>>> usually
>>>> handled by different threads. But since the cache is  
>>>> ThreadLocal, the cache
>>>> are not really usable by other threads. What's worse, the cache  
>>>> can not be
>>>> cleared by another thread!
>>>>
>>>> This leak is not so obvious usually. But my case is using  
>>>> RAMDirectory,
>>>> having several hundred megabytes. So one un-released resource is  
>>>> obvious to
>>>> me.
>>>>
>>>> Here is the reference tree:
>>>> org.apache.lucene.store.RAMDirectory
>>>>  |- directory of org.apache.lucene.store.RAMFile
>>>>     |- file of org.apache.lucene.store.RAMInputStream
>>>>         |- base of org.apache.lucene.index.CompoundFileReader 
>>>> $CSIndexInput
>>>>             |- input of org.apache.lucene.index.SegmentTermEnum
>>>>                 |- value of java.lang.ThreadLocal$ThreadLocalMap 
>>>> $Entry
>>>>
>>>> So you have a RAMDir that has several hundred MB stored in it,  
>>>> that you're done with yet through this path Lucene is keeping it  
>>>> alive?
>>>>
>>>> Did you close the RAMDir?  (which will null its fileMap and  
>>>> should also free your memory).
>>>>
>>>> Also, that reference tree doesn't show the ThreadResources class  
>>>> that was added in that commit -- are you sure this reference  
>>>> tree wasn't before the commit?
>>>>
>>>> Mike
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>> Chris Lu
>>>> -------------------------
>>>> Instant Scalable Full-Text Search On Any Database/Application
>>>> site: http://www.dbsight.net
>>>> demo: http://search.dbsight.com
>>>> Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
>>>> index.php?title=Create_Lucene_Database_Search_in_3_minutes
>>>> DBSight customer, a shopping comparison site, (anonymous per  
>>>> request) got 2.6 Million Euro funding!
>>>
>>>
>>>
>>>
>>
>>
>>
>


Mime
View raw message