lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert engels <reng...@ix.netcom.com>
Subject Re: ThreadLocal in SegmentReader
Date Sat, 12 Jul 2008 14:58:26 GMT
You are mistaken - Yonik's comment in that thread is correct  
(although it is not just at table resize - any time a ThreadLocal is  
added, and any time the ThreadLocal is not found in its direct hash  
it might clear entries).

The ThreadLocals map only has a WeakReference to the ThreadLocal, so  
if that is the only reference, it will be GC'd - eventually, and then  
it will be cleared as new ThreadLocals are created.

With a static reference, the thread can reference the ThreadLocal at  
any time, and thus the WeakReference will not be cleared.

If the object is VERY large, and new ThreadLocals are not created it  
could cause a problem, but I don't think that is the case with  
Lucene, as  the objects stored in ThreadLocals are designed to live  
for the life of the SegmentReader/IndexReader and thread.


On Jul 12, 2008, at 2:12 AM, Roman Puchkovskiy wrote:

>
> Well, possibly I'm mistaken, but it seems that this affects non- 
> static fields
> too. Please see
> http://www.nabble.com/ThreadLocal-in-SegmentReader-to18306230.html  
> where the
> use case is described in the details.
> In short: it seems that the scope of ThreadLocals does not matter.  
> What
> really matters is that they are referenced by ThreadLocals map in  
> the thread
> which is still alive.
>
>
> Robert Engels wrote:
>>
>> This is only an issue for static ThreadLocals ...
>>
>> On Jul 11, 2008, at 11:32 PM, Roman Puchkovskiy wrote:
>>
>>>
>>> The problem here is not because ThreadLocal instances are not GC'd
>>> (they are
>>> GC'd, and your test shows this clearly).
>>> But even one instance which is not removed from its Thread is
>>> enough to
>>> prevent the classloader from being unloaded, and that's the problem.
>>>
>>>
>>> Michael McCandless-2 wrote:
>>>>
>>>> OK, I created a simple test to test this (attached).  The test just
>>>> runs 10 threads, each one creating a 100 KB byte array which is
>>>> stored
>>>> into a ThreadLocal, and then periodically the ThreadLocal is  
>>>> replaced
>>>> with a new one.  This is to test whether GC of a ThreadLocal, even
>>>> though the thread is still alive, in fact leads to GC of the  
>>>> objects
>>>> held in the ThreadLocal.
>>>>
>>>> Indeed on Sun JRE 1.4, 1.5 and 1.6 it appears that the objects  
>>>> are in
>>>> fact properly collected.
>>>>
>>>> So this is not a leak but rather a "delayed collection" issue.
>>>> Java's
>>>> GC is never guaranteed to be immediate, and apparently when using
>>>> ThreadLocals it's even less immediate than "normal".  In the  
>>>> original
>>>> issue, if other things create ThreadLocals, then eventually  
>>>> Lucene's
>>>> unreferenced ThreadLocals would be properly collected.
>>>>
>>>> So I think we continue to use non-static ThreadLocals in Lucene...
>>>>
>>>> Mike
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> robert engels wrote:
>>>>
>>>>> Once again, these are "static" thread locals. A completely  
>>>>> different
>>>>> issue. Since the object is available statically, the weak  
>>>>> reference
>>>>> cannot be cleared so stale entries will never be cleared as  
>>>>> long as
>>>>> the thread is alive.
>>>>>
>>>>> On Jul 9, 2008, at 4:46 PM, Adrian Tarau wrote:
>>>>>
>>>>>> Just a few examples of "problems" using ThreadLocals.
>>>>>>
>>>>>> http://opensource.atlassian.com/projects/hibernate/browse/ 
>>>>>> HHH-2481
>>>>>> http://www.theserverside.com/news/thread.tss?thread_id=41473
>>>>>>
>>>>>> Once again, I'm not pointing to Lucene SegmentReader as a "bad"
>>>>>> implementation, and maybe the current "problems" of ThreadLocals
>>>>>> are not a problem for SegmentReader but it seems safer to use
>>>>>> ThreadLocals to pass context information which is cleared when  
>>>>>> the
>>>>>> call exits instead of storing long-lived objects.
>>>>>>
>>>>>>
>>>>>> robert engels wrote:
>>>>>>>
>>>>>>> Aside from the pre-1.5 thread local "perceived leak", there 

>>>>>>> are no
>>>>>>> issues with ThreadLocals if used properly.
>>>>>>>
>>>>>>> There is no need for try/finally blocks, unless you MUST release
>>>>>>> resources immediately, usually this is not the case, which is
 
>>>>>>> why
>>>>>>> a ThreadLocal is used in the first place.
>>>>>>>
>>>>>>> From the ThreadLocalMap javadoc...
>>>>>>>
>>>>>>>  /**
>>>>>>>      * ThreadLocalMap is a customized hash map suitable only
for
>>>>>>>      * maintaining thread local values. No operations are  
>>>>>>> exported
>>>>>>>      * outside of the ThreadLocal class. The class is package
>>>>>>> private to
>>>>>>>      * allow declaration of fields in class Thread.  To help
 
>>>>>>> deal
>>>>>>> with
>>>>>>>      * very large and long-lived usages, the hash table entries
>>>>>>> use
>>>>>>>      * WeakReferences for keys. However, since reference queues
>>>>>>> are not
>>>>>>>      * used, stale entries are guaranteed to be removed only
 
>>>>>>> when
>>>>>>>      * the table starts running out of space.
>>>>>>>      */
>>>>>>>
>>>>>>> /**
>>>>>>>          * Heuristically scan some cells looking for stale
>>>>>>> entries.
>>>>>>>          * This is invoked when either a new element is  
>>>>>>> added, or
>>>>>>>          * another stale one has been expunged. It performs a
>>>>>>>          * logarithmic number of scans, as a balance between
no
>>>>>>>          * scanning (fast but retains garbage) and a number of
>>>>>>> scans
>>>>>>>          * proportional to number of elements, that would  
>>>>>>> find all
>>>>>>>          * garbage but would cause some insertions to take O(n)
>>>>>>> time.
>>>>>>>          *
>>>>>>>          * @param i a position known NOT to hold a stale entry.
>>>>>>> The
>>>>>>>          * scan starts at the element after i.
>>>>>>>          *
>>>>>>>          * @param n scan control: <tt>log2(n)</tt>
cells are
>>>>>>> scanned,
>>>>>>>          * unless a stale entry one is found, in which case
>>>>>>>          * <tt>log2(table.length)-1</tt> additional
cells are
>>>>>>> scanned.
>>>>>>>          * When called from insertions, this parameter is the
>>>>>>> number
>>>>>>>          * of elements, but when from replaceStaleEntry, it 

>>>>>>> is the
>>>>>>>          * table length. (Note: all this could be changed to
be
>>>>>>> either
>>>>>>>          * more or less aggressive by weighting n instead of
 
>>>>>>> just
>>>>>>>          * using straight log n. But this version is simple,
 
>>>>>>> fast,
>>>>>>> and
>>>>>>>          * seems to work well.)
>>>>>>>          *
>>>>>>>          * @return true if any stale entries have been removed.
>>>>>>>          */
>>>>>>>
>>>>>>>
>>>>>>> The instance ThreadLocals (and what the refer to) will be GC'd
>>>>>>> when the containing Object is GC'd.
>>>>>>>
>>>>>>> There IS NO MEMORY LEAK in ThreadLocal. If the ThreadLocal  
>>>>>>> refers
>>>>>>> to an object that has native resources (e.g. file handles), 

>>>>>>> it may
>>>>>>> not be released until other thread locals are created by the
>>>>>>> thread (or the thread terminates).
>>>>>>>
>>>>>>> You can avoid this "delay" by calling remove(), but in most
>>>>>>> applications it should never be necessary - unless a very  
>>>>>>> strange
>>>>>>> usage...
>>>>>>>
>>>>>>> On Jul 9, 2008, at 2:37 PM, Adrian Tarau wrote:
>>>>>>>
>>>>>>>> From what I know, storing objects in ThreadLocal is safe
as  
>>>>>>>> long
>>>>>>>> as you release the object within a try {} finall {} block
or
>>>>>>>> store objects which are independent of the rest of the code(no
>>>>>>>> dependencies).Otherwise it can get pretty tricky(memory leaks,
>>>>>>>> classloader problems) after awhile.
>>>>>>>>
>>>>>>>> It is pretty convenient to pass HTTP request information
with a
>>>>>>>> ThreadLocal in a servlet(but you should cleanup the variable
>>>>>>>> before leaving the servlet) but I'm not sure how safe it
is in
>>>>>>>> this case.
>>>>>>>>
>>>>>>>> robert engels wrote:
>>>>>>>>> Using synchronization is a poor/invalid substitute for
thread
>>>>>>>>> locals in many cases.
>>>>>>>>>
>>>>>>>>> The point of the thread local in these referenced cases
is too
>>>>>>>>> allow streaming reads on a file descriptor. if you use
a  
>>>>>>>>> shared
>>>>>>>>> file descriptor/buffer you are going to continually invalidate
>>>>>>>>> the buffer.
>>>>>>>>>
>>>>>>>>> On Jul 8, 2008, at 5:12 AM, Michael McCandless wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Well ... SegmentReader uses ThreadLocal to hold a
thread-
>>>>>>>>>> private instance of TermVectorsReader, to avoid synchronizing
>>>>>>>>>> per-document when loading term vectors.
>>>>>>>>>>
>>>>>>>>>> Clearing this ThreadLocal value per call to SegmentReader's
>>>>>>>>>> methods that load term vectors would defeat its purpose.
>>>>>>>>>>
>>>>>>>>>> Though, of course, we then synchronize on the underlying
file
>>>>>>>>>> (when using FSDirectory), so perhaps we are really
not saving
>>>>>>>>>> much by using ThreadLocal here.  But we are looking
to relax
>>>>>>>>>> that low level synchronization with LUCENE-753.
>>>>>>>>>>
>>>>>>>>>> Maybe we could make our own ThreadLocal that just
uses a
>>>>>>>>>> HashMap, which we'd have to synchronize on when getting
the
>>>>>>>>>> per-
>>>>>>>>>> thread instances.  Or, go back to sharing a single
>>>>>>>>>> TermVectorsReader and synchronize per-document.
>>>>>>>>>>
>>>>>>>>>> Jason has suggested moving to a model where you ask
the
>>>>>>>>>> IndexReader for an object that can return term vectors
/  
>>>>>>>>>> stored
>>>>>>>>>> fields / etc, and then you interact with that many
times to
>>>>>>>>>> retrieve each doc.  We could then synchronize only
on
>>>>>>>>>> retrieving that object, and provide a thread-private
 
>>>>>>>>>> instance.
>>>>>>>>>>
>>>>>>>>>> It seems like we should move away from using ThreadLocal
in
>>>>>>>>>> Lucene and do "normal" synchronization instead.
>>>>>>>>>>
>>>>>>>>>> Mike
>>>>>>>>>>
>>>>>>>>>> Adrian Tarau wrote:
>>>>>>>>>>
>>>>>>>>>>> Usually ThreadLocal.remove() should be called
at the end 
>>>>>>>>>>> (in a
>>>>>>>>>>> finally block), before the current call leaves
your code.
>>>>>>>>>>>
>>>>>>>>>>> Ex : if during searching ThreadLocal is used,
every search 
>>>>>>>>>>> (..)
>>>>>>>>>>> method should cleanup any ThreadLocal variables,
or even
>>>>>>>>>>> deeper in the implementation. When the call leaves
Lucene  
>>>>>>>>>>> any
>>>>>>>>>>> used ThreadLocal should be cleaned up.
>>>>>>>>>>>
>>>>>>>>>>> Michael McCandless wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> ThreadLocal, which we use in several places
in Lucene,  
>>>>>>>>>>>> causes
>>>>>>>>>>>> a leak in app servers because the classloader
never fully
>>>>>>>>>>>> deallocates Lucene's classes because the
ThreadLocal is
>>>>>>>>>>>> holding strong references.
>>>>>>>>>>>>
>>>>>>>>>>>> Yet, ThreadLocal is very convenient for avoiding
>>>>>>>>>>>> synchronization.
>>>>>>>>>>>>
>>>>>>>>>>>> Does anyone have any ideas on how to solve
this w/o falling
>>>>>>>>>>>> back to "normal" synchronization?
>>>>>>>>>>>>
>>>>>>>>>>>> Mike
>>>>>>>>>>>>
>>>>>>>>>>>> Begin forwarded message:
>>>>>>>>>>>>
>>>>>>>>>>>>> From: "Yonik Seeley" <yonik@apache.org>
>>>>>>>>>>>>> Date: July 7, 2008 3:30:28 PM EDT
>>>>>>>>>>>>> To: java-user@lucene.apache.org
>>>>>>>>>>>>> Subject: Re: ThreadLocal in SegmentReader
>>>>>>>>>>>>> Reply-To: java-user@lucene.apache.org
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jul 7, 2008 at 2:43 PM, Michael
McCandless
>>>>>>>>>>>>> <lucene@mikemccandless.com> wrote:
>>>>>>>>>>>>>> So now I'm confused: the SegmentReader
itself should no
>>>>>>>>>>>>>> longer be reachable,
>>>>>>>>>>>>>> assuming you are not holding any
references to your
>>>>>>>>>>>>>> IndexReader.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Which means the ThreadLocal instance
should no longer be
>>>>>>>>>>>>>> reachable.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It will still be referenced from the
Thread(s)
>>>>>>>>>>>>> ThreadLocalMap
>>>>>>>>>>>>> The key (the ThreadLocal) will be weakly
referenced,  
>>>>>>>>>>>>> but the
>>>>>>>>>>>>> values
>>>>>>>>>>>>> (now stale) are strongly referenced and
won't be actually
>>>>>>>>>>>>> removed
>>>>>>>>>>>>> until the table is resized (under the
Java6 impl at  
>>>>>>>>>>>>> least).
>>>>>>>>>>>>> Nice huh?
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Yonik
>>>>>>>>>>>>>
>>>>>>>>>>>>> ----------------------------------------------------------

>>>>>>>>>>>>> --
>>>>>>>>>>>>> ---------
>>>>>>>>>>>>> To unsubscribe, e-mail: java-user-
>>>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>>>> For additional commands, e-mail: java-user-
>>>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -----------------------------------------------------------

>>>>>>>>>>>> --
>>>>>>>>>>>> --------
>>>>>>>>>>>> To unsubscribe, e-mail: java-dev-
>>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>>> For additional commands, e-mail: java-dev-
>>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ------------------------------------------------------------

>>>>>>>>>>> --
>>>>>>>>>>> -------
>>>>>>>>>>> To unsubscribe, e-mail: java-dev- 
>>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>>> For additional commands, e-mail: java-dev-
>>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -------------------------------------------------------------

>>>>>>>>>> --
>>>>>>>>>> ------
>>>>>>>>>> To unsubscribe, e-mail: java-dev- 
>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>> For additional commands, e-mail: java-dev-
>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------

>>>>>>>>> --
>>>>>>>>> -----
>>>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>>>> For additional commands, e-mail: java-dev- 
>>>>>>>>> help@lucene.apache.org
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------

>>>>>>>> --
>>>>>>>> ----
>>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-dev- 
>>>>>>>> help@lucene.apache.org
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>
>>> -- 
>>> View this message in context: http://www.nabble.com/Fwd%3A-
>>> ThreadLocal-in-SegmentReader-tp18326720p18416022.html
>>> Sent from the Lucene - Java Developer mailing list archive at
>>> Nabble.com.
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/Fwd%3A- 
> ThreadLocal-in-SegmentReader-tp18326720p18416841.html
> Sent from the Lucene - Java Developer mailing list archive at  
> Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message