Return-Path: Delivered-To: apmail-harmony-dev-archive@www.apache.org Received: (qmail 87434 invoked from network); 12 Dec 2009 17:04:39 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 12 Dec 2009 17:04:39 -0000 Received: (qmail 64929 invoked by uid 500); 12 Dec 2009 17:04:39 -0000 Delivered-To: apmail-harmony-dev-archive@harmony.apache.org Received: (qmail 64851 invoked by uid 500); 12 Dec 2009 17:04:39 -0000 Mailing-List: contact dev-help@harmony.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@harmony.apache.org Delivered-To: mailing list dev@harmony.apache.org Received: (qmail 64840 invoked by uid 99); 12 Dec 2009 17:04:39 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 12 Dec 2009 17:04:39 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sebbaz@gmail.com designates 209.85.220.218 as permitted sender) Received: from [209.85.220.218] (HELO mail-fx0-f218.google.com) (209.85.220.218) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 12 Dec 2009 17:04:30 +0000 Received: by fxm10 with SMTP id 10so2051859fxm.14 for ; Sat, 12 Dec 2009 09:04:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=bGLEw2VZtxVq4VOssO3zXpqRaNBk2i6BbqDRvsj44hY=; b=I9lZgNThjcoRZsl8mh5TG75tChn0nAXny0gMUHC7W4JJCm+juBiLJvXrv6qA2GXrZW 3vYYQWKLFFn09Xg9lx4mGpeNqC/vvH0ZBUViMUQNaY1PTDBDnGpD5a0k4N+IRE9jTDiD +oDQrO5kVNbRIDK43mNkwPQTB2xpT2kSe5zQA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=qHq8MK9ELR+4cpylx8zH0S7f7iBC8rsCQM4rHyy/CNyj3CSDqmjbNQScgJlZS3ocnr 1KcvcHuZnx5njZRUeDf5FRP2kttWv2y0Ck7C9gcLmNuogAIZsgVGLxnCjhDtNHLRT3yK +EgrBRge/sIQSpjc7QOkb8ALAsUd0qLJKCgt4= MIME-Version: 1.0 Received: by 10.239.139.221 with SMTP id u29mr280869hbu.35.1260637449929; Sat, 12 Dec 2009 09:04:09 -0800 (PST) In-Reply-To: <6554f93c0912120753h784f1c21qd4d14c732faa7425@mail.gmail.com> References: <981540465.1260477018418.JavaMail.jira@brutus> <4B21BE3A.4030202@gmail.com> <6554f93c0912102009n45e4981btfc072f0c6feacca7@mail.gmail.com> <4B221ABA.4040809@gmail.com> <87638doa25.fsf@gmail.com> <4B226D80.4040309@gmail.com> <3b3f27c60912112000h7c52613ajbbc9692150a3a44e@mail.gmail.com> <25aac9fc0912120734n23cc3392k2c46f81674a1d679@mail.gmail.com> <6554f93c0912120753h784f1c21qd4d14c732faa7425@mail.gmail.com> Date: Sat, 12 Dec 2009 17:04:09 +0000 Message-ID: <25aac9fc0912120904t5673f3e5s988b50ec78750730@mail.gmail.com> Subject: Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL) From: sebb To: dev@harmony.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org On 12/12/2009, Vijay Menon wrote: > On Sat, Dec 12, 2009 at 7:34 AM, sebb wrote: > > > On 12/12/2009, Nathan Beyer wrote: > > > On Fri, Dec 11, 2009 at 10:04 AM, Tim Ellison > > wrote: > > > > On 11/Dec/2009 14:32, Egor Pasko wrote: > > > >> On the 0x684 day of Apache Harmony Tim Ellison wrote: > > > >>> On 11/Dec/2009 04:09, Vijay Menon wrote: > > > >>>> Perhaps I'm missing some context, but I don't see any problem. I > > don't > > > >>>> believe that this: > > > >>>> > > > >>>> if (hashCode == 0) { > > > >>>> // calculate hash > > > >>>> hashCode = hash; > > > >>>> } > > > >>>> return hashCode; > > > >>>> > > > >>>> can ever return 0 (assuming hash is non-zero), regardless of memory > > fences. > > > >>>> The JMM won't allow visible reordering in a single threaded > > program. > > > >>> I agree. In the multi-threaded case there can be a data race on the > > > >>> hashCode, with the effect that the same hashCode value is > > unnecessarily, > > > >>> but harmlessly, recalculated. > > > >> > > > >> Vijay, Tim, you are not 100% correct here. > > > >> > > > >> 1. there should be an extra restriction that the part "calculate > > hash" > > > >> does not touch the hashCode field. Without that restriction more > > > >> trivial races can happen as discussed in LANG-481. > > > >> > > > >> So we should assume this code: > > > >> > > > >> if (this.hashCode == 0) { > > > >> int hash; > > > >> if (this.hashCode == 0) { > > > >> // Calculate using 'hash' only, not this.hashCode. > > > >> this.hashCode = hash; > > > >> } > > > >> return this.hashCode; > > > >> } > > > > > > > > Yes, I guess I figured that was assumed :-) > > > > > > > > Of course, there are lots of things you could put into the > > > > "// Calculate..." section that would be unsafe. We should stick with > > > > showing the non-abbreviated implementation to avoid ambiguity: > > > > > > > > public int hashCode() { > > > > if (hashCode == 0) { > > > > if (count == 0) { > > > > return 0; > > > > } > > > > int hash = 0; > > > > for (int i = offset; i < count + offset; i++) { > > > > hash = value[i] + ((hash << 5) - hash); > > > > } > > > > hashCode = hash; > > > > } > > > > return hashCode; > > > > } > > > > > > > > > > > > > I think I understand the concern, after some additional reading. The > > > issue seems to be that 'hashCode' is read twice and the field is not > > > protected by any memory barriers (synchronized, volatile, etc). As > > > such, it would be okay for the second read to be done using a cached > > > value, which means that both reads could return 0 in the same thread > > > of execution. Another way to look at it is that the write to > > > 'hashCode' may or may not affect subsequent reads of 'hashCode'. This > > > is how I understand it. > > > > > > Will that happen in practice? I have no idea. It does seem possible. > > > > The Java MM guarantees that a single thread behaves as if the code is > > processed sequentially. > > So if the thread writes non-zero to this.hashCode it cannot then > > return zero for the value of this.hashCode if no other threads > > intervene. The thread cannot ignore updates to values it has itself > > cached! > > > > If another thread writes to this.hashCode concurrently, then this > > thread may or may not see the value stored by that thread. In this > > case, it's not a problem, as another thread can only write a fixed > > value. So the worst that can happen is that this.hashCode is written > > more than once, and the current thread may fetch the value written by > > another thread. But this is the same value it wrote anyway. > > > > > In a multithreaded setting, this code *can* break and return 0 if hashCode > is read twice. This is not just a performance optimization - it is a > correctness issue. The compiler / runtime / hardware is allowed to reorder > read operations. The following racy execution is allowable under the JMM: > > 1. Thread 1 reads 0 from hashCode and stores 0 into a local (t1). > 2. Thread 2 write 42 into hashCode. > 3. Thread 1 reads 42 from hashCode and stores 42 into a local (t2). > 4. Thread 1 compares t2 (42) with 0 and does not execute the if clause. > 5. Thread 1 returns t1 (0). > But why would Thread 1 read hashCode twice before the comparison? Seems to me that would break the "as if serial" guarantee for a single thread. In the code sequence, the comparison is before the return, and therefore "happens-before" the return. I.e. step 3 "happens-before" step 1+5. I'm not saying Harmony should keep the current code - the suggested temp variable version seems better anyway - just trying to understand what (if anything) is currently broken. > - Vijay > > > > > > > > In any case, it does seem a pinch more efficient to only do one read > > > of hashCode ... switch up the code to be something like this. > > > > Agreed. > > > > > public int hashCode() { > > > final int hash = hashCode; > > > if (hash == 0) { > > > > > > if (count == 0) { > > > return 0; > > > } > > > > > > for (int i = offset; i < count + offset; i++) { > > > hash = value[i] + ((hash << 5) - hash); > > > } > > > hashCode = hash; > > > } > > > > > > return hash; > > > } > > > > > > Thoughts? > > > > > > > > > >> where 'this.*' is always a memory reference while 'hash' is a thread > > private var. > > > >> > > > >> 2. DRLVM JIT indeed does not privatize field access to threads. It > > > >> always loads fields from memory in original order. Hence this > > > >> potential bug does not affect DRLVM .. now. But potentially it can > > > >> start optimizing things this way because current behavior prevents > > > >> a bunch of high level optimizations. > > > >> > > > >> 3. Jeremy Manson, being an expert in Java Memory Model claims [1] > > that > > > >> such reordering is allowed theoretically. I.e. like this: > > > >> > > > >> int hash = this.hashCode; > > > >> if (this.hashCode == 0) { > > > >> hash = this.hashCode = // calculate hash > > > >> } > > > >> return hash; > > > >> > > > >> This is a correct single-threaded code. What happened here is a > > > >> lengthy thread privatization of this.hashCode (again, does not happen > > > >> in DRLVM). That is incorrect in multithreaded environment and needs > > > >> extra synchronization/volatile/etc. > > > >> > > > >> 4. I do not know why a JIT would want to do that, i.e. two sequential > > > >> reads from the same memory location. Sounds like a bit synthetic > > example. > > > > > > > > ...at which point a bunch of code probably would go wrong! So > > hopefully > > > > it remains only a theoretical possibility. > > > > > > > > Regards, > > > > Tim > > > > > > > >> [1] > > http://jeremymanson.blogspot.com/2008/12/benign-data-races-in-java.html > > > >> > > > > > > > > > >