lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert engels <reng...@ix.netcom.com>
Subject Re: ThreadLocal causing memory leak with J2EE applications
Date Wed, 10 Sep 2008 15:40:53 GMT
Actually, a single RAMDirectory would be sufficient (since it  
supports writes). There should never be a reason to create a new  
RAMDirectory (unless you have some specialized real-time search  
occuring).

If you are creating new RAMDirectories, the statements below hold.

On Sep 10, 2008, at 10:34 AM, robert engels wrote:

> It is basic Java. Threads are not guaranteed to run on any sort of  
> schedule. If you create lots of large objects in one thread,  
> releasing them in another, there is a good chance you will get an  
> OOM (since the releasing thread may not run before the OOM  
> occurs)...  This is not Lucene specific by any means.
>
> It is a misunderstanding on your part about how GC works.
>
> I assume you must at some point be creating new RAMDirectories -  
> otherwise the memory would never really increase, since the  
> IndexReader/enums/etc are not very large...
>
> When you create a new RAMDirectories, you need to BE CERTAIN !!!  
> that the other IndexReaders/Searchers using the old RAMDirectory  
> are ALL CLOSED, otherwise their memory will still be in use, which  
> leads to your OOM...
>
>
> On Sep 10, 2008, at 10:16 AM, Chris Lu wrote:
>
>> I do not believe I am making any mistake. Actually I just got an  
>> email from another user, complaining about the same thing. And I  
>> am having the same usage pattern.
>>
>> After the reader is opened, the RAMDirectory is shared by several  
>> objects.
>> There is one instance of RAMDirectory in the memory, and it is  
>> holding lots of memory, which is expected.
>>
>> If I close the reader in the same thread that has opened it, the  
>> RAMDirectory is gone from the memory.
>> If I close the reader in other threads, the RAMDirectory is left  
>> in the memory, referenced along the tree I draw in the first email.
>>
>> I do not think the usage is wrong. Period.
>>
>> -------------------------------------
>> Hi,
>>
>>    i found a forum post from you here [1] where you mention that you
>> have a memory leak using the lucene ram directory. I'd like to ask  
>> you
>> if you already have resolved the problem and how you did it or maybe
>> you know where i can read about the solution. We are using
>> RAMDirectory too and figured out, that over time the memory
>> consumption raises and raises until the system breaks down but only
>> when we performing much index updates. if we only create the index  
>> and
>> don't do nothing except searching it, it work fine.
>>
>> maybe you can give me a hint or a link,
>> greetz,
>> -------------------------------------
>>
>> -- 
>> Chris Lu
>> -------------------------
>> Instant Scalable Full-Text Search On Any Database/Application
>> site: http://www.dbsight.net
>> demo: http://search.dbsight.com
>> Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
>> index.php?title=Create_Lucene_Database_Search_in_3_minutes
>> DBSight customer, a shopping comparison site, (anonymous per  
>> request) got 2.6 Million Euro funding!
>>
>> On Wed, Sep 10, 2008 at 7:12 AM, robert engels  
>> <rengels@ix.netcom.com> wrote:
>> Sorry, but I am fairly certain you are mistaken.
>>
>> If you only have a single IndexReader, the RAMDirectory will be  
>> shared in all cases.
>>
>> The only memory growth is any buffer space allocated by an  
>> IndexInput (used in many places and cached).
>>
>> Normally the IndexInput created by a RAMDirectory do not have any  
>> buffer allocated, since the underlying store is already in memory.
>>
>> You have some other problem in your code...
>>
>> On Sep 10, 2008, at 1:10 AM, Chris Lu wrote:
>>
>>> Actually, even I only use one IndexReader, some resources are  
>>> cached via the ThreadLocal cache, and can not be released unless  
>>> all threads do the close action.
>>>
>>> SegmentTermEnum itself is small, but it holds RAMDirectory along  
>>> the path, which is big.
>>>
>>> -- 
>>> Chris Lu
>>> -------------------------
>>> Instant Scalable Full-Text Search On Any Database/Application
>>> site: http://www.dbsight.net
>>> demo: http://search.dbsight.com
>>> Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
>>> index.php?title=Create_Lucene_Database_Search_in_3_minutes
>>> DBSight customer, a shopping comparison site, (anonymous per  
>>> request) got 2.6 Million Euro funding!
>>>
>>> On Tue, Sep 9, 2008 at 10:43 PM, robert engels  
>>> <rengels@ix.netcom.com> wrote:
>>> You do not need a pool of IndexReaders...
>>>
>>> It does not matter what class it is, what matters is the class  
>>> that ultimately holds the reference.
>>>
>>> If the IndexReader is never closed, the SegmentReader(s) is never  
>>> closed, so the thread local in TermInfosReader is not cleared  
>>> (because the thread never dies). So you will get one  
>>> SegmentTermEnum, per thread * per segment.
>>>
>>> The SegmentTermEnum is not a large object, so even if you had 100  
>>> threads, and 100 segments, for 10k instances, seems hard to  
>>> believe that is the source of your memory issue.
>>>
>>> The SegmentTermEnum is cached by thread since it needs to  
>>> enumerate the terms, not having a per thread cache, would lead to  
>>> lots of random access when multiple threads read the index - very  
>>> slow.
>>>
>>> You need to keep in mind, what if every thread was executing a  
>>> search simultaneously - you would still have 100x100  
>>> SegmentTermEnum instances anyway !  The only way to prevent that  
>>> would be to create and destroy the SegmentTermEnum on each call  
>>> (opening and seeking to the proper spot) - which would be SLOW  
>>> SLOW SLOW.
>>>
>>> On Sep 10, 2008, at 12:19 AM, Chris Lu wrote:
>>>
>>>> I have tried to create an IndexReader pool and dynamically  
>>>> create searcher. But the memory leak is the same. It's not  
>>>> related to the Searcher class specifically, but the  
>>>> SegmentTermEnum in TermInfosReader.
>>>>
>>>> -- 
>>>> Chris Lu
>>>> -------------------------
>>>> Instant Scalable Full-Text Search On Any Database/Application
>>>> site: http://www.dbsight.net
>>>> demo: http://search.dbsight.com
>>>> Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
>>>> index.php?title=Create_Lucene_Database_Search_in_3_minutes
>>>> DBSight customer, a shopping comparison site, (anonymous per  
>>>> request) got 2.6 Million Euro funding!
>>>>
>>>> On Tue, Sep 9, 2008 at 10:14 PM, robert engels  
>>>> <rengels@ix.netcom.com> wrote:
>>>> A searcher uses an IndexReader - the IndexReader is slow to  
>>>> open, not a Searcher. And searchers can share an IndexReader.
>>>>
>>>> You want to create a single shared (across all threads/users)  
>>>> IndexReader (usually), and create an Searcher as needed and  
>>>> dispose.  It is VERY CHEAP to create the Searcher.
>>>>
>>>> I am fairly certain the javadoc on Searcher is incorrect.  The  
>>>> warning "For performance reasons it is recommended to open only  
>>>> one IndexSearcher and use it for all of your searches" is not  
>>>> true in the case where an IndexReader is passed to the ctor.
>>>>
>>>> Any caching should USUALLY be performed at the IndexReader level.
>>>>
>>>> You are most likely using the "path" ctor, and that is the  
>>>> source of your problems, as multiple IndexReader instances are  
>>>> being created, and thus the memory use.
>>>>
>>>>
>>>> On Sep 9, 2008, at 11:44 PM, Chris Lu wrote:
>>>>
>>>>> On J2EE environment, usually there is a searcher pool with  
>>>>> several searchers open.
>>>>> The speed to opening a large index for every user is not  
>>>>> acceptable.
>>>>>
>>>>> -- 
>>>>> Chris Lu
>>>>> -------------------------
>>>>> Instant Scalable Full-Text Search On Any Database/Application
>>>>> site: http://www.dbsight.net
>>>>> demo: http://search.dbsight.com
>>>>> Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
>>>>> index.php?title=Create_Lucene_Database_Search_in_3_minutes
>>>>> DBSight customer, a shopping comparison site, (anonymous per  
>>>>> request) got 2.6 Million Euro funding!
>>>>>
>>>>> On Tue, Sep 9, 2008 at 9:03 PM, robert engels  
>>>>> <rengels@ix.netcom.com> wrote:
>>>>> You need to close the searcher within the thread that is using  
>>>>> it, in order to have it cleaned up quickly... usually right  
>>>>> after you display the page of results.
>>>>>
>>>>> If you are keeping multiple searcher refs across multiple  
>>>>> threads for paging/whatever, you have not coded it correctly.
>>>>>
>>>>> Imagine 10,000 users - storing a searcher for each one is not  
>>>>> going to work...
>>>>>
>>>>> On Sep 9, 2008, at 10:21 PM, Chris Lu wrote:
>>>>>
>>>>>> Right, in a sense I can not release it from another thread.  
>>>>>> But that's the problem.
>>>>>>
>>>>>> It's a J2EE environment, all threads are kind of equal. It's  
>>>>>> simply not possible to iterate through all threads to close  
>>>>>> the searcher, thus releasing the ThreadLocal cache.
>>>>>> Unless Lucene is not recommended for J2EE environment, this  
>>>>>> has to be fixed.
>>>>>>
>>>>>> -- 
>>>>>> Chris Lu
>>>>>> -------------------------
>>>>>> Instant Scalable Full-Text Search On Any Database/Application
>>>>>> site: http://www.dbsight.net
>>>>>> demo: http://search.dbsight.com
>>>>>> Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
>>>>>> index.php?title=Create_Lucene_Database_Search_in_3_minutes
>>>>>> DBSight customer, a shopping comparison site, (anonymous per  
>>>>>> request) got 2.6 Million Euro funding!
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 9, 2008 at 8:14 PM, robert engels  
>>>>>> <rengels@ix.netcom.com> wrote:
>>>>>> Your code is not correct. You cannot release it on another  
>>>>>> thread - the first thread may creating hundreds/thousands of  
>>>>>> instances before the other thread ever runs...
>>>>>>
>>>>>> On Sep 9, 2008, at 10:10 PM, Chris Lu wrote:
>>>>>>
>>>>>>> If I release it on the thread that's creating the searcher, 

>>>>>>> by setting searcher=null, everything is fine, the memory is 

>>>>>>> released very cleanly.
>>>>>>> My load test was to repeatedly create a searcher on a  
>>>>>>> RAMDirectory and release it on another thread. The test will
 
>>>>>>> quickly go to OOM after several runs. I set the heap size to
 
>>>>>>> be 1024M, and the RAMDirectory is of size 250M. Using some  
>>>>>>> profiling tool, the used size simply stepped up pretty  
>>>>>>> obviously by 250M.
>>>>>>>
>>>>>>> I think we should not rely on something that's a "maybe"  
>>>>>>> behavior, especially for a general purpose library.
>>>>>>>
>>>>>>> Since it's a multi-threaded env, the thread that's creating 

>>>>>>> the entries in the LRU cache may not go away quickly(actually
 
>>>>>>> most, if not all, application servers will try to reuse  
>>>>>>> threads), so the LRU cache, which uses thread as the key, can
 
>>>>>>> not be released, so the SegmentTermEnum which is in the same
 
>>>>>>> class can not be released.
>>>>>>>
>>>>>>> And yes, I close the RAMDirectory, and the fileMap is  
>>>>>>> released. I verified that through the profiler by directly  
>>>>>>> checking the values in the snapshot.
>>>>>>>
>>>>>>> Pretty sure the reference tree wasn't like this using code  
>>>>>>> before this commit, because after close the searcher in  
>>>>>>> another thread, the RAMDirectory totally disappeared from the
 
>>>>>>> memory snapshot.
>>>>>>>
>>>>>>> -- 
>>>>>>> Chris Lu
>>>>>>> -------------------------
>>>>>>> Instant Scalable Full-Text Search On Any Database/Application
>>>>>>> site: http://www.dbsight.net
>>>>>>> demo: http://search.dbsight.com
>>>>>>> Lucene Database Search in 3 minutes: http://wiki.dbsight.com/

>>>>>>> index.php?title=Create_Lucene_Database_Search_in_3_minutes
>>>>>>> DBSight customer, a shopping comparison site, (anonymous per
 
>>>>>>> request) got 2.6 Million Euro funding!
>>>>>>>
>>>>>>> On Tue, Sep 9, 2008 at 5:03 PM, Michael McCandless  
>>>>>>> <lucene@mikemccandless.com> wrote:
>>>>>>>
>>>>>>> Chris Lu wrote:
>>>>>>>
>>>>>>> The problem should be similar to what's talked about on this
 
>>>>>>> discussion.
>>>>>>> http://lucene.markmail.org/message/keosgz2c2yjc7qre? 
>>>>>>> q=ThreadLocal
>>>>>>>
>>>>>>> The "rough" conclusion of that thread is that, technically, 

>>>>>>> this isn't a memory leak but rather a "delayed freeing"  
>>>>>>> problem.  Ie, it may take longer, possibly much longer, than
 
>>>>>>> you want for the memory to be freed.
>>>>>>>
>>>>>>>
>>>>>>> There is a memory leak for Lucene search from Lucene-1195. 
>>>>>>> (svn r659602, May23,2008)
>>>>>>>
>>>>>>> This patch brings in a ThreadLocal cache to TermInfosReader.
>>>>>>>
>>>>>>> One thing that confuses me: TermInfosReader was already using
 
>>>>>>> a ThreadLocal to cache the SegmentTermEnum instance.  What  
>>>>>>> was added in this commit (for LUCENE-1195) was an LRU cache 

>>>>>>> storing Term -> TermInfo instances.  But it seems like it's
 
>>>>>>> the SegmentTermEnum instance that you're tracing below.
>>>>>>>
>>>>>>>
>>>>>>> It's usually recommended to keep the reader open, and reuse 

>>>>>>> it when
>>>>>>> possible. In a common J2EE application, the http requests are
 
>>>>>>> usually
>>>>>>> handled by different threads. But since the cache is  
>>>>>>> ThreadLocal, the cache
>>>>>>> are not really usable by other threads. What's worse, the  
>>>>>>> cache can not be
>>>>>>> cleared by another thread!
>>>>>>>
>>>>>>> This leak is not so obvious usually. But my case is using  
>>>>>>> RAMDirectory,
>>>>>>> having several hundred megabytes. So one un-released resource
 
>>>>>>> is obvious to
>>>>>>> me.
>>>>>>>
>>>>>>> Here is the reference tree:
>>>>>>> org.apache.lucene.store.RAMDirectory
>>>>>>>  |- directory of org.apache.lucene.store.RAMFile
>>>>>>>     |- file of org.apache.lucene.store.RAMInputStream
>>>>>>>         |- base of org.apache.lucene.index.CompoundFileReader

>>>>>>> $CSIndexInput
>>>>>>>             |- input of org.apache.lucene.index.SegmentTermEnum
>>>>>>>                 |- value of java.lang.ThreadLocal 
>>>>>>> $ThreadLocalMap$Entry
>>>>>>>
>>>>>>> So you have a RAMDir that has several hundred MB stored in  
>>>>>>> it, that you're done with yet through this path Lucene is  
>>>>>>> keeping it alive?
>>>>>>>
>>>>>>> Did you close the RAMDir?  (which will null its fileMap and 

>>>>>>> should also free your memory).
>>>>>>>
>>>>>>> Also, that reference tree doesn't show the ThreadResources  
>>>>>>> class that was added in that commit -- are you sure this  
>>>>>>> reference tree wasn't before the commit?
>>>>>>>
>>>>>>> Mike
>>>>>>>
>>>>>>> ----------------------------------------------------------------

>>>>>>> -----
>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -- 
>>>>>>> Chris Lu
>>>>>>> -------------------------
>>>>>>> Instant Scalable Full-Text Search On Any Database/Application
>>>>>>> site: http://www.dbsight.net
>>>>>>> demo: http://search.dbsight.com
>>>>>>> Lucene Database Search in 3 minutes: http://wiki.dbsight.com/

>>>>>>> index.php?title=Create_Lucene_Database_Search_in_3_minutes
>>>>>>> DBSight customer, a shopping comparison site, (anonymous per
 
>>>>>>> request) got 2.6 Million Euro funding!
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>


Mime
View raw message