Just confirmed the fix for this problem is ready in patch LUCENE-1383

Thanks Robert Engels for arguing with me and understand the problem quickly, and contributed a ClosableThreadLocal class, although the problem itself is hard to reproduce for him, and thanks Michael McCandless for fixing the problem soooo quickly.

-- 
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro funding!

Michael McCandless wrote:

Yeah I think that's the right approach.

I'll code it up.

Mike

robert engels wrote:

I think that would work, but I think you would be better off encapsulating that in an extended ThreadLocal, e.g. WeakThreadLocal, and use that every where. Add a method clear(), that clears the ThreadLocals list (which will allow the values to be GC'd).


On Sep 11, 2008, at 9:43 AM, Michael McCandless wrote:


OK so we compact the list (removing dead threads) every time we add a new entry to the list.  This way for a long lived SegmentReader but short lived threads, the list keeps only live threads.

We do need sync access to the list, but that's only on binding a new thread.  Retrieving an existing thread has no sync.

Mike

robert engels wrote:

You still need to sync access to the list, and how would it be removed from the list prior to close? That is you need one per thread, but you can have the reader shared across all threads. So if threads were created and destroyed without ever closing the reader, the list would grow unbounded.

On Sep 11, 2008, at 9:20 AM, Michael McCandless wrote:


I don't need it by thread, because I would still use ThreadLocal to retrieve the SegmentTermEnum.  This avoids any sync during get.

The list is just a "fallback" to hold a hard reference to the SegmentTermEnum to keep it alive.  That's it's only purpose.  Then, when SegmentReader is closed this list is cleared and GC is free to reclaim all SegmentTermEnums.

Mike

robert engels wrote:

But you need it by thread, so it can't be a list.

You could have a HashMap of <Thread,ThreadState> in FieldsReader, and when SegmentReader is closed, FieldsReader is closed, which clears the map, and not use thread locals at all. The difference being you would need a sync'd map.

On Sep 11, 2008, at 4:56 AM, Michael McCandless wrote:


What if we wrap the value in a WeakReference, but secondarily hold a hard reference to it in a "normal" list?

Then, when TermInfosReader is closed we clear that list of all its hard references, at which point GC will be free to reclaim the object out from under the ThreadLocal even before the ThreadLocal purges its stale entries.

Mike

robert engels wrote:

You can't hold the ThreadLocal value in a WeakReference, because there is no hard reference between enumeration calls (so it would be cleared out from under you while enumerating).

All of this occurs because you have some objects (readers/segments etc.) that are shared across all threads, but these contain objects that are 'thread/search state' specific. These latter objects are essentially "cached" for performance (so you don't need to seek and read, sequential buffer access, etc.)

A sometimes better solution is to have the state returned to the caller, and require the caller to pass/use the state later - then you don't need thread locals.

You can accomplish a similar solution by returning a "SessionKey" object, and have the caller pass this later.  You can then have a WeakHashMap of SessionKey,SearchState that the code can use.  When the SessionKey is destroyed (no longer referenced), the state map can be cleaned up automatically.



On Sep 10, 2008, at 11:30 PM, Noble Paul നോബിള്‍ नोब्ळ् wrote:

When I look at the reference tree That is the feeling I get. if you
held a WeakReference it would get released .
|- base of org.apache.lucene.index.CompoundFileReader$CSIndexInput
          |- input of org.apache.lucene.index.SegmentTermEnum
              |- value of java.lang.ThreadLocal$ThreadLocalMap$Entry

On Wed, Sep 10, 2008 at 8:39 PM, Chris Lu <chris.lu@gmail.com> wrote:
Does this make any difference?
If I intentionally close the searcher and reader failed to release the
memory, I can not rely on some magic of JVM to release it.
--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!

On Wed, Sep 10, 2008 at 4:03 AM, Noble Paul നോബിള്‍ नोब्ळ्
<noble.paul@gmail.com> wrote:

Why do you need to keep a strong reference?
Why not a WeakReference ?

--Noble

On Wed, Sep 10, 2008 at 12:27 AM, Chris Lu <chris.lu@gmail.com> wrote:
The problem should be similar to what's talked about on this discussion.
http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal

There is a memory leak for Lucene search from Lucene-1195.(svn r659602,
May23,2008)

This patch brings in a ThreadLocal cache to TermInfosReader.

It's usually recommended to keep the reader open, and reuse it when
possible. In a common J2EE application, the http requests are usually
handled by different threads. But since the cache is ThreadLocal, the
cache
are not really usable by other threads. What's worse, the cache can not
be
cleared by another thread!

This leak is not so obvious usually. But my case is using RAMDirectory,
having several hundred megabytes. So one un-released resource is obvious
to
me.

Here is the reference tree:
org.apache.lucene.store.RAMDirectory
|- directory of org.apache.lucene.store.RAMFile
 |- file of org.apache.lucene.store.RAMInputStream
     |- base of
org.apache.lucene.index.CompoundFileReader$CSIndexInput
         |- input of org.apache.lucene.index.SegmentTermEnum
             |- value of java.lang.ThreadLocal$ThreadLocalMap$Entry


After I switched back to svn revision 659601, right before this patch is
checked in, the memory leak is gone.
Although my case is RAMDirectory, I believe this will affect disk based
index also.

--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:

http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request)
got
2.6 Million Euro funding!




--
--Noble Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org







-- 
--Noble Paul


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org