Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 45879 invoked from network); 9 Jul 2008 20:16:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 9 Jul 2008 20:16:57 -0000 Received: (qmail 86903 invoked by uid 500); 9 Jul 2008 20:16:54 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 86605 invoked by uid 500); 9 Jul 2008 20:16:53 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 86596 invoked by uid 99); 9 Jul 2008 20:16:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Jul 2008 13:16:52 -0700 X-ASF-Spam-Status: No, hits=3.2 required=10.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.86.89.69] (HELO elasmtp-mealy.atl.sa.earthlink.net) (209.86.89.69) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Jul 2008 20:15:58 +0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dk20050327; d=ix.netcom.com; b=c37hdIbcbwGifPfrsF5iS4mF6D+oLDv/pYEIzD++P92jOYkzGZlvMxk0JAa9yMFA; h=Received:Mime-Version:In-Reply-To:References:Content-Type:Message-Id:From:Subject:Date:To:X-Mailer:X-ELNK-Trace:X-Originating-IP; Received: from [69.209.76.99] (helo=[192.168.1.65]) by elasmtp-mealy.atl.sa.earthlink.net with esmtpa (Exim 4.67) (envelope-from ) id 1KGg4z-0000Mz-AJ for java-dev@lucene.apache.org; Wed, 09 Jul 2008 16:15:58 -0400 Mime-Version: 1.0 (Apple Message framework v753.1) In-Reply-To: <4875138A.3000004@gmail.com> References: <48728954.3030903@gmail.com> <71B2141B-BE78-4DFA-A6C4-559274C53929@mikemccandless.com> <04264C3B-CA65-429A-AC98-AC9FEF4CA6A6@ix.netcom.com> <4875138A.3000004@gmail.com> Content-Type: multipart/alternative; boundary=Apple-Mail-12--1043328389 Message-Id: <18B26EA1-44CB-42B0-B46B-7748C717C914@ix.netcom.com> From: robert engels Subject: Re: ThreadLocal in SegmentReader Date: Wed, 9 Jul 2008 15:15:56 -0500 To: java-dev@lucene.apache.org X-Mailer: Apple Mail (2.753.1) X-ELNK-Trace: 33cbdd8ed9881ca8776432462e451d7b7f19f0d9c038d9aa9511cb6b2861d8a87eb1269c7ca668c6350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c X-Originating-IP: 69.209.76.99 X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-12--1043328389 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Aside from the pre-1.5 thread local "perceived leak", there are no issues with ThreadLocals if used properly. There is no need for try/finally blocks, unless you MUST release resources immediately, usually this is not the case, which is why a ThreadLocal is used in the first place. From the ThreadLocalMap javadoc... /** * ThreadLocalMap is a customized hash map suitable only for * maintaining thread local values. No operations are exported * outside of the ThreadLocal class. The class is package private to * allow declaration of fields in class Thread. To help deal with * very large and long-lived usages, the hash table entries use * WeakReferences for keys. However, since reference queues are not * used, stale entries are guaranteed to be removed only when * the table starts running out of space. */ /** * Heuristically scan some cells looking for stale entries. * This is invoked when either a new element is added, or * another stale one has been expunged. It performs a * logarithmic number of scans, as a balance between no * scanning (fast but retains garbage) and a number of scans * proportional to number of elements, that would find all * garbage but would cause some insertions to take O(n) time. * * @param i a position known NOT to hold a stale entry. The * scan starts at the element after i. * * @param n scan control: log2(n) cells are scanned, * unless a stale entry one is found, in which case * log2(table.length)-1 additional cells are scanned. * When called from insertions, this parameter is the number * of elements, but when from replaceStaleEntry, it is the * table length. (Note: all this could be changed to be either * more or less aggressive by weighting n instead of just * using straight log n. But this version is simple, fast, and * seems to work well.) * * @return true if any stale entries have been removed. */ The instance ThreadLocals (and what the refer to) will be GC'd when the containing Object is GC'd. There IS NO MEMORY LEAK in ThreadLocal. If the ThreadLocal refers to an object that has native resources (e.g. file handles), it may not be released until other thread locals are created by the thread (or the thread terminates). You can avoid this "delay" by calling remove(), but in most applications it should never be necessary - unless a very strange usage... On Jul 9, 2008, at 2:37 PM, Adrian Tarau wrote: > From what I know, storing objects in ThreadLocal is safe as long as > you release the object within a try {} finall {} block or store > objects which are independent of the rest of the code(no > dependencies).Otherwise it can get pretty tricky(memory leaks, > classloader problems) after awhile. > > It is pretty convenient to pass HTTP request information with a > ThreadLocal in a servlet(but you should cleanup the variable before > leaving the servlet) but I'm not sure how safe it is in this case. > > robert engels wrote: >> Using synchronization is a poor/invalid substitute for thread >> locals in many cases. >> >> The point of the thread local in these referenced cases is too >> allow streaming reads on a file descriptor. if you use a shared >> file descriptor/buffer you are going to continually invalidate the >> buffer. >> >> On Jul 8, 2008, at 5:12 AM, Michael McCandless wrote: >> >>> >>> Well ... SegmentReader uses ThreadLocal to hold a thread-private >>> instance of TermVectorsReader, to avoid synchronizing per- >>> document when loading term vectors. >>> >>> Clearing this ThreadLocal value per call to SegmentReader's >>> methods that load term vectors would defeat its purpose. >>> >>> Though, of course, we then synchronize on the underlying file >>> (when using FSDirectory), so perhaps we are really not saving >>> much by using ThreadLocal here. But we are looking to relax that >>> low level synchronization with LUCENE-753. >>> >>> Maybe we could make our own ThreadLocal that just uses a HashMap, >>> which we'd have to synchronize on when getting the per-thread >>> instances. Or, go back to sharing a single TermVectorsReader and >>> synchronize per-document. >>> >>> Jason has suggested moving to a model where you ask the >>> IndexReader for an object that can return term vectors / stored >>> fields / etc, and then you interact with that many times to >>> retrieve each doc. We could then synchronize only on retrieving >>> that object, and provide a thread-private instance. >>> >>> It seems like we should move away from using ThreadLocal in >>> Lucene and do "normal" synchronization instead. >>> >>> Mike >>> >>> Adrian Tarau wrote: >>> >>>> Usually ThreadLocal.remove() should be called at the end(in a >>>> finally block), before the current call leaves your code. >>>> >>>> Ex : if during searching ThreadLocal is used, every search(..) >>>> method should cleanup any ThreadLocal variables, or even deeper >>>> in the implementation. When the call leaves Lucene any used >>>> ThreadLocal should be cleaned up. >>>> >>>> Michael McCandless wrote: >>>>> >>>>> ThreadLocal, which we use in several places in Lucene, causes a >>>>> leak in app servers because the classloader never fully >>>>> deallocates Lucene's classes because the ThreadLocal is holding >>>>> strong references. >>>>> >>>>> Yet, ThreadLocal is very convenient for avoiding synchronization. >>>>> >>>>> Does anyone have any ideas on how to solve this w/o falling >>>>> back to "normal" synchronization? >>>>> >>>>> Mike >>>>> >>>>> Begin forwarded message: >>>>> >>>>>> From: "Yonik Seeley" >>>>>> Date: July 7, 2008 3:30:28 PM EDT >>>>>> To: java-user@lucene.apache.org >>>>>> Subject: Re: ThreadLocal in SegmentReader >>>>>> Reply-To: java-user@lucene.apache.org >>>>>> >>>>>> On Mon, Jul 7, 2008 at 2:43 PM, Michael McCandless >>>>>> wrote: >>>>>>> So now I'm confused: the SegmentReader itself should no >>>>>>> longer be reachable, >>>>>>> assuming you are not holding any references to your IndexReader. >>>>>>> >>>>>>> Which means the ThreadLocal instance should no longer be >>>>>>> reachable. >>>>>> >>>>>> It will still be referenced from the Thread(s) ThreadLocalMap >>>>>> The key (the ThreadLocal) will be weakly referenced, but the >>>>>> values >>>>>> (now stale) are strongly referenced and won't be actually removed >>>>>> until the table is resized (under the Java6 impl at least). >>>>>> Nice huh? >>>>>> >>>>>> -Yonik >>>>>> >>>>>> ----------------------------------------------------------------- >>>>>> ---- >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------ >>>>> --- >>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org >>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org >>>>> >>>> >>>> >>>> ------------------------------------------------------------------- >>>> -- >>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org >>>> For additional commands, e-mail: java-dev-help@lucene.apache.org >>>> >>> >>> >>> -------------------------------------------------------------------- >>> - >>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-dev-help@lucene.apache.org >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-dev-help@lucene.apache.org >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > --Apple-Mail-12--1043328389 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=ISO-8859-1 Aside from the pre-1.5 thread local "perceived leak", there are no = issues with ThreadLocals if used properly.

There is = no need for try/finally blocks, unless you MUST release resources = immediately, usually this is not the case, which is why a ThreadLocal is = used in the first place.

=46rom the = ThreadLocalMap javadoc...

=A0/**
=A0=A0 =A0 * ThreadLocalMap is a customized hash map suitable only for
=A0=A0 =A0 * maintaining thread local values. No operations are exported
=A0=A0 =A0 = * outside of the ThreadLocal class. The class is package private to
=A0=A0 =A0 * allow declaration of fields in class Thread.=A0 = To help deal with
=A0=A0 =A0 = * very large and long-lived usages, the hash table entries use
=A0=A0 =A0 = * WeakReferences for keys. However, since reference queues are not
=A0=A0 =A0 * used, stale entries are guaranteed to be removed only when
=A0=A0 =A0 * the table starts running out of space.
=A0=A0 =A0 */

/**
=A0=A0 =A0 =A0 =A0 * Heuristically scan some cells looking for stale entries.
=A0=A0 =A0 =A0 = =A0 * This is invoked when either a new element is added, or
=A0=A0 =A0 =A0 =A0 * another stale one has been expunged. It performs a
=A0=A0 =A0 =A0 = =A0 * logarithmic number of scans, as a balance between no
=A0=A0 =A0 =A0 =A0 * scanning (fast but retains garbage) and a number of scans
=A0=A0 =A0 =A0 =A0 * proportional to number of elements, that would find all
=A0=A0 =A0 =A0 =A0 * garbage but would cause some insertions to take O(n) time.
=A0=A0 =A0 =A0 =A0 *
=A0=A0 =A0 =A0 = =A0 * @param i a position known NOT to hold a stale entry. The
=A0=A0 =A0 =A0 =A0 * scan starts at the element after i.
=A0=A0 =A0 =A0 =A0 *
=A0=A0 =A0 =A0 =A0 * @param n scan control: <tt>log2(n)</tt> cells are scanned,
=A0=A0 =A0 =A0 = =A0 * unless a stale entry one is found, in which case
=A0=A0 =A0 =A0 = =A0 * <tt>log2(table.length)-1</tt> additional cells are scanned.
=A0=A0 =A0 =A0 = =A0 * When called from insertions, this parameter is the number
=A0=A0 =A0 =A0 =A0 * of elements, but when from replaceStaleEntry, it is the
=A0=A0 =A0 =A0 = =A0 * table length. (Note: all this could be changed to be either
=A0=A0 =A0 =A0 =A0 * more or less aggressive by weighting n instead of just
=A0=A0 =A0 =A0 =A0 * using straight log n. But this version is simple, fast, and
=A0=A0 =A0 =A0 =A0 * seems to work well.)
=A0=A0 =A0 =A0 =A0 *
=A0=A0 =A0 =A0 =A0 * @return true if any stale entries have been removed.
=A0=A0 =A0 =A0 = =A0 */


=
The instance ThreadLocals (and what the refer to) will be = GC'd when the containing Object is GC'd.

There = IS NO MEMORY LEAK in ThreadLocal. If the ThreadLocal refers to an object = that has native resources (e.g. file handles), it may not be released = until other thread locals are created by the thread (or the thread = terminates).

You can avoid this "delay" by = calling remove(), but in most applications it should never be necessary = - unless a very strange = usage...

On Jul 9, 2008, at = 2:37 PM, Adrian Tarau wrote:

=46rom what I know, storing objects in ThreadLocal = is safe as long as you release the object within a try {} finall {} = block or store objects which are independent of the rest of the code(no = dependencies).Otherwise it can get pretty tricky(memory leaks, = classloader problems) after awhile.

It is pretty convenient to pass = HTTP request information with a ThreadLocal in a servlet(but you should = cleanup the variable before leaving the servlet) but I'm not sure how = safe it is in this case.

robert engels wrote:
=
Using synchronization is a = poor/invalid substitute for thread locals in many cases.

The = point of the thread local in these referenced cases is too allow = streaming reads on a file descriptor. if you use a shared file = descriptor/buffer you are going to continually invalidate the = buffer.

On Jul 8, 2008, at 5:12 AM, Michael McCandless = wrote:

=
Well ... SegmentReader uses = ThreadLocal to hold a thread-private instance of TermVectorsReader, to = avoid synchronizing per-document when loading term vectors.

Clearing = this ThreadLocal value per call to SegmentReader's methods that load = term vectors would defeat its purpose.

Though, of = course, we then synchronize on the underlying file (when using = FSDirectory), so perhaps we are really not saving much by using = ThreadLocal here.=A0 But we = are looking to relax that low level synchronization with = LUCENE-753.

Maybe we could make our own ThreadLocal that just = uses a HashMap, which we'd have to synchronize on when getting the = per-thread instances.=A0 = Or, go back to sharing a single TermVectorsReader and synchronize = per-document.

Jason has suggested moving to a model where you ask = the IndexReader for an object that can return term vectors / stored = fields / etc, and then you interact with that many times to retrieve = each doc.=A0 We could then = synchronize only on retrieving that object, and provide a thread-private = instance.

It seems like we should move away from using = ThreadLocal in Lucene and do "normal" synchronization instead.


Adrian Tarau wrote:

Usually ThreadLocal.remove() should be called at the = end(in a finally block), before the current call leaves your = code.

Ex : if during searching ThreadLocal is used, every = search(..) method should cleanup any ThreadLocal variables, or even = deeper in the implementation. When the call leaves Lucene any used = ThreadLocal should be cleaned up.

Michael McCandless wrote:
=
ThreadLocal, which we use in = several places in Lucene, causes a leak in app servers because the = classloader never fully deallocates Lucene's classes because the = ThreadLocal is holding strong references.

Yet, = ThreadLocal is very convenient for avoiding synchronization.

Does = anyone have any ideas on how to solve this w/o falling back to "normal" = synchronization?

Mike

Begin forwarded = message:

=
From: "Yonik Seeley" <yonik@apache.org>
Date: July 7, 2008 3:30:28 PM EDT

So now I'm = confused: the SegmentReader itself should no longer be = reachable,
assuming you are not holding any = references to your IndexReader.

Which means the ThreadLocal = instance should no longer be reachable.

It will = still be referenced from the Thread(s) ThreadLocalMap
The key (the ThreadLocal) will be weakly referenced, = but the values
(now stale) are strongly = referenced and won't be actually removed
until = the table is resized (under the Java6 impl at least).
Nice huh?

-Yonik

For = additional commands, e-mail: java-user-help@lucene.apa= che.org

=


For = additional commands, e-mail: java-dev-help@lucene.apach= e.org

=


For = additional commands, e-mail: java-dev-help@lucene.apach= e.org

=


For = additional commands, e-mail: java-dev-help@lucene.apach= e.org

=


For = additional commands, e-mail: java-dev-help@lucene.apach= e.org

=


For = additional commands, e-mail: java-dev-help@lucene.apach= e.org

=

= --Apple-Mail-12--1043328389--