lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrian Tarau <adrian.ta...@gmail.com>
Subject Re: ThreadLocal in SegmentReader
Date Wed, 09 Jul 2008 21:46:37 GMT
Just a few examples of "problems" using ThreadLocals.

http://opensource.atlassian.com/projects/hibernate/browse/HHH-2481
http://www.theserverside.com/news/thread.tss?thread_id=41473

Once again, I'm not pointing to Lucene SegmentReader as a "bad" 
implementation, and maybe the current "problems" of ThreadLocals are not 
a problem for SegmentReader but it seems safer to use ThreadLocals to 
pass context information which is cleared when the call exits instead of 
storing long-lived objects.


robert engels wrote:
> Aside from the pre-1.5 thread local "perceived leak", there are no 
> issues with ThreadLocals if used properly.
>
> There is no need for try/finally blocks, unless you MUST release 
> resources immediately, usually this is not the case, which is why a 
> ThreadLocal is used in the first place.
>
> From the ThreadLocalMap javadoc...
>
>  /**
>      * ThreadLocalMap is a customized hash map suitable only for
>      * maintaining thread local values. No operations are exported
>      * outside of the ThreadLocal class. The class is package private to
>      * allow declaration of fields in class Thread.  To help deal with
>      * very large and long-lived usages, the hash table entries use
>      * WeakReferences for keys. However, since reference queues are not
>      * used, stale entries are guaranteed to be removed only when
>      * the table starts running out of space.
>      */
>
> /**
>          * Heuristically scan some cells looking for stale entries.
>          * This is invoked when either a new element is added, or
>          * another stale one has been expunged. It performs a
>          * logarithmic number of scans, as a balance between no
>          * scanning (fast but retains garbage) and a number of scans
>          * proportional to number of elements, that would find all
>          * garbage but would cause some insertions to take O(n) time.
>          *
>          * @param i a position known NOT to hold a stale entry. The
>          * scan starts at the element after i.
>          *
>          * @param n scan control: <tt>log2(n)</tt> cells are scanned,
>          * unless a stale entry one is found, in which case
>          * <tt>log2(table.length)-1</tt> additional cells are scanned.
>          * When called from insertions, this parameter is the number
>          * of elements, but when from replaceStaleEntry, it is the
>          * table length. (Note: all this could be changed to be either
>          * more or less aggressive by weighting n instead of just
>          * using straight log n. But this version is simple, fast, and
>          * seems to work well.)
>          *
>          * @return true if any stale entries have been removed.
>          */
>
>
> The instance ThreadLocals (and what the refer to) will be GC'd when 
> the containing Object is GC'd.
>
> There IS NO MEMORY LEAK in ThreadLocal. If the ThreadLocal refers to 
> an object that has native resources (e.g. file handles), it may not be 
> released until other thread locals are created by the thread (or the 
> thread terminates).
>
> You can avoid this "delay" by calling remove(), but in most 
> applications it should never be necessary - unless a very strange usage...
>
> On Jul 9, 2008, at 2:37 PM, Adrian Tarau wrote:
>
>> From what I know, storing objects in ThreadLocal is safe as long as 
>> you release the object within a try {} finall {} block or store 
>> objects which are independent of the rest of the code(no 
>> dependencies).Otherwise it can get pretty tricky(memory leaks, 
>> classloader problems) after awhile.
>>
>> It is pretty convenient to pass HTTP request information with a 
>> ThreadLocal in a servlet(but you should cleanup the variable before 
>> leaving the servlet) but I'm not sure how safe it is in this case.
>>
>> robert engels wrote:
>>> Using synchronization is a poor/invalid substitute for thread locals 
>>> in many cases.
>>>
>>> The point of the thread local in these referenced cases is too allow 
>>> streaming reads on a file descriptor. if you use a shared file 
>>> descriptor/buffer you are going to continually invalidate the buffer.
>>>
>>> On Jul 8, 2008, at 5:12 AM, Michael McCandless wrote:
>>>
>>>>
>>>> Well ... SegmentReader uses ThreadLocal to hold a thread-private 
>>>> instance of TermVectorsReader, to avoid synchronizing per-document 
>>>> when loading term vectors.
>>>>
>>>> Clearing this ThreadLocal value per call to SegmentReader's methods 
>>>> that load term vectors would defeat its purpose.
>>>>
>>>> Though, of course, we then synchronize on the underlying file (when 
>>>> using FSDirectory), so perhaps we are really not saving much by 
>>>> using ThreadLocal here.  But we are looking to relax that low level 
>>>> synchronization with LUCENE-753.
>>>>
>>>> Maybe we could make our own ThreadLocal that just uses a HashMap, 
>>>> which we'd have to synchronize on when getting the per-thread 
>>>> instances.  Or, go back to sharing a single TermVectorsReader and 
>>>> synchronize per-document.
>>>>
>>>> Jason has suggested moving to a model where you ask the IndexReader 
>>>> for an object that can return term vectors / stored fields / etc, 
>>>> and then you interact with that many times to retrieve each doc.  
>>>> We could then synchronize only on retrieving that object, and 
>>>> provide a thread-private instance.
>>>>
>>>> It seems like we should move away from using ThreadLocal in Lucene 
>>>> and do "normal" synchronization instead.
>>>>
>>>> Mike
>>>>
>>>> Adrian Tarau wrote:
>>>>
>>>>> Usually ThreadLocal.remove() should be called at the end(in a 
>>>>> finally block), before the current call leaves your code.
>>>>>
>>>>> Ex : if during searching ThreadLocal is used, every search(..) 
>>>>> method should cleanup any ThreadLocal variables, or even deeper in 
>>>>> the implementation. When the call leaves Lucene any used 
>>>>> ThreadLocal should be cleaned up.
>>>>>
>>>>> Michael McCandless wrote:
>>>>>>
>>>>>> ThreadLocal, which we use in several places in Lucene, causes a 
>>>>>> leak in app servers because the classloader never fully 
>>>>>> deallocates Lucene's classes because the ThreadLocal is holding 
>>>>>> strong references.
>>>>>>
>>>>>> Yet, ThreadLocal is very convenient for avoiding synchronization.
>>>>>>
>>>>>> Does anyone have any ideas on how to solve this w/o falling back

>>>>>> to "normal" synchronization?
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>> Begin forwarded message:
>>>>>>
>>>>>>> From: "Yonik Seeley" <yonik@apache.org <mailto:yonik@apache.org>>
>>>>>>> Date: July 7, 2008 3:30:28 PM EDT
>>>>>>> To: java-user@lucene.apache.org <mailto:java-user@lucene.apache.org>
>>>>>>> Subject: Re: ThreadLocal in SegmentReader
>>>>>>> Reply-To: java-user@lucene.apache.org 
>>>>>>> <mailto:java-user@lucene.apache.org>
>>>>>>>
>>>>>>> On Mon, Jul 7, 2008 at 2:43 PM, Michael McCandless
>>>>>>> <lucene@mikemccandless.com <mailto:lucene@mikemccandless.com>>

>>>>>>> wrote:
>>>>>>>> So now I'm confused: the SegmentReader itself should no longer

>>>>>>>> be reachable,
>>>>>>>> assuming you are not holding any references to your IndexReader.
>>>>>>>>
>>>>>>>> Which means the ThreadLocal instance should no longer be
reachable.
>>>>>>>
>>>>>>> It will still be referenced from the Thread(s) ThreadLocalMap
>>>>>>> The key (the ThreadLocal) will be weakly referenced, but the
values
>>>>>>> (now stale) are strongly referenced and won't be actually removed
>>>>>>> until the table is resized (under the Java6 impl at least).
>>>>>>> Nice huh?
>>>>>>>
>>>>>>> -Yonik
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org

>>>>>>> <mailto:java-user-unsubscribe@lucene.apache.org>
>>>>>>> For additional commands, e-mail: 
>>>>>>> java-user-help@lucene.apache.org 
>>>>>>> <mailto:java-user-help@lucene.apache.org>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org 
>>>>>> <mailto:java-dev-unsubscribe@lucene.apache.org>
>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org

>>>>>> <mailto:java-dev-help@lucene.apache.org>
>>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org 
>>>>> <mailto:java-dev-unsubscribe@lucene.apache.org>
>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org 
>>>>> <mailto:java-dev-help@lucene.apache.org>
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org 
>>>> <mailto:java-dev-unsubscribe@lucene.apache.org>
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org 
>>>> <mailto:java-dev-help@lucene.apache.org>
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org 
>>> <mailto:java-dev-unsubscribe@lucene.apache.org>
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org 
>>> <mailto:java-dev-help@lucene.apache.org>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org 
>> <mailto:java-dev-unsubscribe@lucene.apache.org>
>> For additional commands, e-mail: java-dev-help@lucene.apache.org 
>> <mailto:java-dev-help@lucene.apache.org>
>>
>


Mime
View raw message