lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Lu" <chris...@gmail.com>
Subject Re: ThreadLocal causing memory leak with J2EE applications
Date Wed, 10 Sep 2008 15:16:44 GMT
I do not believe I am making any mistake. Actually I just got an email from
another user, complaining about the same thing. And I am having the same
usage pattern.
After the reader is opened, the RAMDirectory is shared by several objects.
There is one instance of RAMDirectory in the memory, and it is holding lots
of memory, which is expected.

If I close the reader in the same thread that has opened it, the
RAMDirectory is gone from the memory.
If I close the reader in other threads, the RAMDirectory is left in the
memory, referenced along the tree I draw in the first email.

I do not think the usage is wrong. Period.

-------------------------------------

Hi,

   i found a forum post from you here [1] where you mention that you
have a memory leak using the lucene ram directory. I'd like to ask you
if you already have resolved the problem and how you did it or maybe
you know where i can read about the solution. We are using
RAMDirectory too and figured out, that over time the memory
consumption raises and raises until the system breaks down but only
when we performing much index updates. if we only create the index and
don't do nothing except searching it, it work fine.

maybe you can give me a hint or a link,
greetz,

-------------------------------------

-- 
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!

On Wed, Sep 10, 2008 at 7:12 AM, robert engels <rengels@ix.netcom.com>wrote:

> Sorry, but I am fairly certain you are mistaken.
> If you only have a single IndexReader, the RAMDirectory will be shared in
> all cases.
>
> The only memory growth is any buffer space allocated by an IndexInput (used
> in many places and cached).
>
> Normally the IndexInput created by a RAMDirectory do not have any buffer
> allocated, since the underlying store is already in memory.
>
> You have some other problem in your code...
>
> On Sep 10, 2008, at 1:10 AM, Chris Lu wrote:
>
> Actually, even I only use one IndexReader, some resources are cached via
> the ThreadLocal cache, and can not be released unless all threads do the
> close action.
>
> SegmentTermEnum itself is small, but it holds RAMDirectory along the path,
> which is big.
>
> --
> Chris Lu
> -------------------------
> Instant Scalable Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
> Lucene Database Search in 3 minutes:
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
> DBSight customer, a shopping comparison site, (anonymous per request) got
> 2.6 Million Euro funding!
> On Tue, Sep 9, 2008 at 10:43 PM, robert engels <rengels@ix.netcom.com>wrote:
>
>>  You do not need a pool of IndexReaders...
>> It does not matter what class it is, what matters is the class that
>> ultimately holds the reference.
>>
>> If the IndexReader is never closed, the SegmentReader(s) is never closed,
>> so the thread local in TermInfosReader is not cleared (because the thread
>> never dies). So you will get one SegmentTermEnum, per thread * per segment.
>>
>> The SegmentTermEnum is not a large object, so even if you had 100 threads,
>> and 100 segments, for 10k instances, seems hard to believe that is the
>> source of your memory issue.
>>
>> The SegmentTermEnum is cached by thread since it needs to enumerate the
>> terms, not having a per thread cache, would lead to lots of random access
>> when multiple threads read the index - very slow.
>>
>> You need to keep in mind, what if every thread was executing a search
>> simultaneously - you would still have 100x100 SegmentTermEnum instances
>> anyway !  The only way to prevent that would be to create and destroy the
>> SegmentTermEnum on each call (opening and seeking to the proper spot) -
>> which would be SLOW SLOW SLOW.
>>
>> On Sep 10, 2008, at 12:19 AM, Chris Lu wrote:
>>
>> I have tried to create an IndexReader pool and dynamically create
>> searcher. But the memory leak is the same. It's not related to the Searcher
>> class specifically, but the SegmentTermEnum in TermInfosReader.
>>
>> --
>> Chris Lu
>> -------------------------
>> Instant Scalable Full-Text Search On Any Database/Application
>> site: http://www.dbsight.net
>> demo: http://search.dbsight.com
>> Lucene Database Search in 3 minutes:
>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>> DBSight customer, a shopping comparison site, (anonymous per request) got
>> 2.6 Million Euro funding!
>>
>> On Tue, Sep 9, 2008 at 10:14 PM, robert engels <rengels@ix.netcom.com>wrote:
>>
>>>  A searcher uses an IndexReader - the IndexReader is slow to open, not a
>>> Searcher. And searchers can share an IndexReader.
>>> You want to create a single shared (across all threads/users) IndexReader
>>> (usually), and create an Searcher as needed and dispose.  It is VERY CHEAP
>>> to create the Searcher.
>>>
>>> I am fairly certain the javadoc on Searcher is incorrect.  The warning "
>>> For performance reasons it is recommended to open only one IndexSearcher
>>> and use it for all of your searches" is not true in the case where an
>>> IndexReader is passed to the ctor.
>>>
>>> Any caching should USUALLY be performed at the IndexReader level.
>>>
>>> You are most likely using the "path" ctor, and that is the source of your
>>> problems, as multiple IndexReader instances are being created, and thus the
>>> memory use.
>>>
>>>
>>> On Sep 9, 2008, at 11:44 PM, Chris Lu wrote:
>>>
>>> On J2EE environment, usually there is a searcher pool with several
>>> searchers open. The speed to opening a large index for every user is not
>>> acceptable.
>>>
>>> --
>>> Chris Lu
>>> -------------------------
>>> Instant Scalable Full-Text Search On Any Database/Application
>>> site: http://www.dbsight.net
>>> demo: http://search.dbsight.com
>>> Lucene Database Search in 3 minutes:
>>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>>> DBSight customer, a shopping comparison site, (anonymous per request) got
>>> 2.6 Million Euro funding!
>>>
>>> On Tue, Sep 9, 2008 at 9:03 PM, robert engels <rengels@ix.netcom.com>wrote:
>>>
>>>> You need to close the searcher within the thread that is using it, in
>>>> order to have it cleaned up quickly... usually right after you display the
>>>> page of results.
>>>> If you are keeping multiple searcher refs across multiple threads for
>>>> paging/whatever, you have not coded it correctly.
>>>>
>>>> Imagine 10,000 users - storing a searcher for each one is not going to
>>>> work...
>>>>
>>>> On Sep 9, 2008, at 10:21 PM, Chris Lu wrote:
>>>>
>>>> Right, in a sense I can not release it from another thread. But that's
>>>> the problem.
>>>>
>>>> It's a J2EE environment, all threads are kind of equal. It's simply not
>>>> possible to iterate through all threads to close the searcher, thus
>>>> releasing the ThreadLocal cache.
>>>> Unless Lucene is not recommended for J2EE environment, this has to be
>>>> fixed.
>>>>
>>>> --
>>>> Chris Lu
>>>> -------------------------
>>>> Instant Scalable Full-Text Search On Any Database/Application
>>>> site: http://www.dbsight.net
>>>> demo: http://search.dbsight.com
>>>> Lucene Database Search in 3 minutes:
>>>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>>>> DBSight customer, a shopping comparison site, (anonymous per request)
>>>> got 2.6 Million Euro funding!
>>>>
>>>> On Tue, Sep 9, 2008 at 8:14 PM, robert engels <rengels@ix.netcom.com>wrote:
>>>>
>>>>> Your code is not correct. You cannot release it on another thread - the
>>>>> first thread may creating hundreds/thousands of instances before the
other
>>>>> thread ever runs...
>>>>>
>>>>> On Sep 9, 2008, at 10:10 PM, Chris Lu wrote:
>>>>>
>>>>> If I release it on the thread that's creating the searcher, by setting
>>>>> searcher=null, everything is fine, the memory is released very cleanly.
>>>>> My load test was to repeatedly create a searcher on a RAMDirectory and
>>>>> release it on another thread. The test will quickly go to OOM after several
>>>>> runs. I set the heap size to be 1024M, and the RAMDirectory is of size
250M.
>>>>> Using some profiling tool, the used size simply stepped up pretty obviously
>>>>> by 250M.
>>>>>
>>>>> I think we should not rely on something that's a "maybe" behavior,
>>>>> especially for a general purpose library.
>>>>>
>>>>> Since it's a multi-threaded env, the thread that's creating the entries
>>>>> in the LRU cache may not go away quickly(actually most, if not all,
>>>>> application servers will try to reuse threads), so the LRU cache, which
uses
>>>>> thread as the key, can not be released, so the SegmentTermEnum which
is in
>>>>> the same class can not be released.
>>>>>
>>>>> And yes, I close the RAMDirectory, and the fileMap is released. I
>>>>> verified that through the profiler by directly checking the values in
the
>>>>> snapshot.
>>>>>
>>>>> Pretty sure the reference tree wasn't like this using code before this
>>>>> commit, because after close the searcher in another thread, the RAMDirectory
>>>>> totally disappeared from the memory snapshot.
>>>>>
>>>>> --
>>>>> Chris Lu
>>>>> -------------------------
>>>>> Instant Scalable Full-Text Search On Any Database/Application
>>>>> site: http://www.dbsight.net
>>>>> demo: http://search.dbsight.com
>>>>> Lucene Database Search in 3 minutes:
>>>>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>>>>> DBSight customer, a shopping comparison site, (anonymous per request)
>>>>> got 2.6 Million Euro funding!
>>>>>
>>>>> On Tue, Sep 9, 2008 at 5:03 PM, Michael McCandless <
>>>>> lucene@mikemccandless.com> wrote:
>>>>>
>>>>>>
>>>>>> Chris Lu wrote:
>>>>>>
>>>>>>  The problem should be similar to what's talked about on this
>>>>>>> discussion.
>>>>>>> http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal
>>>>>>>
>>>>>>
>>>>>> The "rough" conclusion of that thread is that, technically, this
isn't
>>>>>> a memory leak but rather a "delayed freeing" problem.  Ie, it may
take
>>>>>> longer, possibly much longer, than you want for the memory to be
freed.
>>>>>>
>>>>>>  There is a memory leak for Lucene search from Lucene-1195.(svn
>>>>>>> r659602, May23,2008)
>>>>>>>
>>>>>>> This patch brings in a ThreadLocal cache to TermInfosReader.
>>>>>>>
>>>>>>
>>>>>> One thing that confuses me: TermInfosReader was already using a
>>>>>> ThreadLocal to cache the SegmentTermEnum instance.  What was added
in this
>>>>>> commit (for LUCENE-1195) was an LRU cache storing Term -> TermInfo
>>>>>> instances.  But it seems like it's the SegmentTermEnum instance that
you're
>>>>>> tracing below.
>>>>>>
>>>>>>  It's usually recommended to keep the reader open, and reuse it when
>>>>>>> possible. In a common J2EE application, the http requests are
usually
>>>>>>> handled by different threads. But since the cache is ThreadLocal,
the
>>>>>>> cache
>>>>>>> are not really usable by other threads. What's worse, the cache
can
>>>>>>> not be
>>>>>>> cleared by another thread!
>>>>>>>
>>>>>>> This leak is not so obvious usually. But my case is using
>>>>>>> RAMDirectory,
>>>>>>> having several hundred megabytes. So one un-released resource
is
>>>>>>> obvious to
>>>>>>> me.
>>>>>>>
>>>>>>> Here is the reference tree:
>>>>>>> org.apache.lucene.store.RAMDirectory
>>>>>>>  |- directory of org.apache.lucene.store.RAMFile
>>>>>>>     |- file of org.apache.lucene.store.RAMInputStream
>>>>>>>         |- base of
>>>>>>> org.apache.lucene.index.CompoundFileReader$CSIndexInput
>>>>>>>             |- input of org.apache.lucene.index.SegmentTermEnum
>>>>>>>                 |- value of
>>>>>>> java.lang.ThreadLocal$ThreadLocalMap$Entry
>>>>>>>
>>>>>>
>>>>>> So you have a RAMDir that has several hundred MB stored in it, that
>>>>>> you're done with yet through this path Lucene is keeping it alive?
>>>>>>
>>>>>> Did you close the RAMDir?  (which will null its fileMap and should
>>>>>> also free your memory).
>>>>>>
>>>>>> Also, that reference tree doesn't show the ThreadResources class
that
>>>>>> was added in that commit -- are you sure this reference tree wasn't
before
>>>>>> the commit?
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Chris Lu
>>>>> -------------------------
>>>>> Instant Scalable Full-Text Search On Any Database/Application
>>>>> site: http://www.dbsight.net
>>>>> demo: http://search.dbsight.com
>>>>> Lucene Database Search in 3 minutes:
>>>>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>>>>> DBSight customer, a shopping comparison site, (anonymous per request)
>>>>> got 2.6 Million Euro funding!
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>>
>
>
>
>

Mime
View raw message