harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vijay Menon <...@google.com>
Subject Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)
Date Sat, 12 Dec 2009 19:37:42 GMT
On Sat, Dec 12, 2009 at 9:04 AM, sebb <sebbaz@gmail.com> wrote:

> On 12/12/2009, Vijay Menon <vsm@google.com> wrote:
> > On Sat, Dec 12, 2009 at 7:34 AM, sebb <sebbaz@gmail.com> wrote:
> >
> >  > On 12/12/2009, Nathan Beyer <ndbeyer@apache.org> wrote:
> >  > > On Fri, Dec 11, 2009 at 10:04 AM, Tim Ellison <
> t.p.ellison@gmail.com>
> >  > wrote:
> >  > >  > On 11/Dec/2009 14:32, Egor Pasko wrote:
> >  > >  >> On the 0x684 day of Apache Harmony Tim Ellison wrote:
> >  > >  >>> On 11/Dec/2009 04:09, Vijay Menon wrote:
> >  > >  >>>> Perhaps I'm missing some context, but I don't see any
problem.
>  I
> >  > don't
> >  > >  >>>> believe that this:
> >  > >  >>>>
> >  > >  >>>>         if (hashCode == 0) {
> >  > >  >>>>             // calculate hash
> >  > >  >>>>             hashCode = hash;
> >  > >  >>>>         }
> >  > >  >>>>         return hashCode;
> >  > >  >>>>
> >  > >  >>>> can ever return 0 (assuming hash is non-zero), regardless
of
> memory
> >  > fences.
> >  > >  >>>>  The JMM won't allow visible reordering in a single
threaded
> >  > program.
> >  > >  >>> I agree.  In the multi-threaded case there can be a data
race
> on the
> >  > >  >>> hashCode, with the effect that the same hashCode value
is
> >  > unnecessarily,
> >  > >  >>> but harmlessly, recalculated.
> >  > >  >>
> >  > >  >> Vijay, Tim, you are not 100% correct here.
> >  > >  >>
> >  > >  >> 1. there should be an extra restriction that the part "calculate
> >  > hash"
> >  > >  >>    does not touch the hashCode field. Without that restriction
> more
> >  > >  >>    trivial races can happen as discussed in LANG-481.
> >  > >  >>
> >  > >  >> So we should assume this code:
> >  > >  >>
> >  > >  >> if (this.hashCode == 0) {
> >  > >  >>   int hash;
> >  > >  >>   if (this.hashCode == 0) {
> >  > >  >>     // Calculate using 'hash' only, not this.hashCode.
> >  > >  >>     this.hashCode = hash;
> >  > >  >>   }
> >  > >  >>   return this.hashCode;
> >  > >  >> }
> >  > >  >
> >  > >  > Yes, I guess I figured that was assumed :-)
> >  > >  >
> >  > >  > Of course, there are lots of things you could put into the
> >  > >  > "// Calculate..." section that would be unsafe.  We should stick
> with
> >  > >  > showing the non-abbreviated implementation to avoid ambiguity:
> >  > >  >
> >  > >  >    public int hashCode() {
> >  > >  >        if (hashCode == 0) {
> >  > >  >            if (count == 0) {
> >  > >  >                return 0;
> >  > >  >            }
> >  > >  >            int hash = 0;
> >  > >  >            for (int i = offset; i < count + offset; i++) {
> >  > >  >                hash = value[i] + ((hash << 5) - hash);
> >  > >  >            }
> >  > >  >            hashCode = hash;
> >  > >  >        }
> >  > >  >        return hashCode;
> >  > >  >    }
> >  > >  >
> >  > >
> >  > >
> >  > > I think I understand the concern, after some additional reading. The
> >  > >  issue seems to be that 'hashCode' is read twice and the field is
> not
> >  > >  protected by any memory barriers (synchronized, volatile, etc). As
> >  > >  such, it would be okay for the second read to be done using a
> cached
> >  > >  value, which means that both reads could return 0 in the same
> thread
> >  > >  of execution. Another way to look at it is that the write to
> >  > >  'hashCode' may or may not affect subsequent reads of 'hashCode'.
> This
> >  > >  is how I understand it.
> >  > >
> >  > >  Will that happen in practice? I have no idea. It does seem
> possible.
> >  >
> >  > The Java MM guarantees that a single thread behaves as if the code is
> >  > processed sequentially.
> >  > So if the thread writes non-zero to this.hashCode it cannot then
> >  > return zero for the value of this.hashCode if no other threads
> >  > intervene. The thread cannot ignore updates to values it has itself
> >  > cached!
> >  >
> >  > If another thread writes to this.hashCode concurrently, then this
> >  > thread may or may not see the value stored by that thread. In this
> >  > case, it's not a problem, as another thread can only write a fixed
> >  > value. So the worst that can happen is that this.hashCode is written
> >  > more than once, and the current thread may fetch the value written by
> >  > another thread. But this is the same value it wrote anyway.
> >  >
> >
> >
> > In a multithreaded setting, this code *can* break and return 0 if
> hashCode
> >  is read twice.  This is not just a performance optimization - it is a
> >  correctness issue.  The compiler / runtime / hardware is allowed to
> reorder
> >  read operations.  The following racy execution is allowable under the
> JMM:
> >
> >  1. Thread 1 reads 0 from hashCode and stores 0 into a local (t1).
> >  2. Thread 2 write 42 into hashCode.
> >  3. Thread 1 reads 42 from hashCode and stores 42 into a local (t2).
> >  4. Thread 1 compares t2 (42) with 0 and does not execute the if clause.
> >  5. Thread 1 returns t1 (0).
> >
>
> But why would Thread 1 read hashCode twice before the comparison?
>

Yeah, this confused me.  The blog Egor linked has more detail.  The key is
that the original code reads hashCode twice.  It's legal to hoist the second
read above the comparison and above the first read - as long as you add a
compensating read in the if-clause to keep the single threaded case correct.
 The compiler can legally convert the original code to this:

  t1 = this.hashCode;
  t2 = this.hashCode;
  if (t2 == 0) {
    // compute hash
    this.hashCode = hash;
    t1 = this.hashCode;
  }
  return t1;


> Seems to me that would break the "as if serial" guarantee for a single
> thread.
> In the code sequence, the comparison is before the return, and
> therefore "happens-before" the return. I.e. step 3 "happens-before"
> step 1+5.
>

The above code will always be correct in the single thread case - so this
guarantee is basically kept.  The problem is that the write from the other
thread has no happens-before relation with either read.  Essentially, the
JMM allows each read to (independently) decide whether they see the result
of the other thread's write.

- Vijay


>
> I'm not saying Harmony should keep the current code - the suggested
> temp variable version seems better anyway - just trying to understand
> what (if anything) is currently broken.
>
> >  - Vijay
> >
> >
> >
> >  >
> >  > >  In any case, it does seem a pinch more efficient to only do one
> read
> >  > >  of hashCode ... switch up the code to be something like this.
> >  >
> >  > Agreed.
> >  >
> >  > >  public int hashCode() {
> >  > >     final int hash = hashCode;
> >  > >     if (hash == 0) {
> >  > >
> >  > >         if (count == 0) {
> >  > >             return 0;
> >  > >         }
> >  > >
> >  > >         for (int i = offset; i < count + offset; i++) {
> >  > >             hash = value[i] + ((hash << 5) - hash);
> >  > >         }
> >  > >         hashCode = hash;
> >  > >     }
> >  > >
> >  > >     return hash;
> >  > >  }
> >  > >
> >  > >  Thoughts?
> >  > >
> >  > >
> >  > >  >> where 'this.*' is always a memory reference while 'hash' is
a
> thread
> >  > private var.
> >  > >  >>
> >  > >  >> 2. DRLVM JIT indeed does not privatize field access to threads.
> It
> >  > >  >>    always loads fields from memory in original order. Hence
this
> >  > >  >>    potential bug does not affect DRLVM .. now. But potentially
> it can
> >  > >  >>    start optimizing things this way because current behavior
> prevents
> >  > >  >>    a bunch of high level optimizations.
> >  > >  >>
> >  > >  >> 3. Jeremy Manson, being an expert in Java Memory Model claims
> [1]
> >  > that
> >  > >  >>    such reordering is allowed theoretically. I.e. like this:
> >  > >  >>
> >  > >  >> int hash = this.hashCode;
> >  > >  >> if (this.hashCode == 0) {
> >  > >  >>   hash = this.hashCode = // calculate hash
> >  > >  >> }
> >  > >  >> return hash;
> >  > >  >>
> >  > >  >> This is a correct single-threaded code. What happened here
is a
> >  > >  >> lengthy thread privatization of this.hashCode (again, does
not
> happen
> >  > >  >> in DRLVM). That is incorrect in multithreaded environment and
> needs
> >  > >  >> extra synchronization/volatile/etc.
> >  > >  >>
> >  > >  >> 4. I do not know why a JIT would want to do that, i.e. two
> sequential
> >  > >  >>    reads from the same memory location. Sounds like a bit
> synthetic
> >  > example.
> >  > >  >
> >  > >  > ...at which point a bunch of code probably would go wrong!  So
> >  > hopefully
> >  > >  > it remains only a theoretical possibility.
> >  > >  >
> >  > >  > Regards,
> >  > >  > Tim
> >  > >  >
> >  > >  >> [1]
> >  >
> http://jeremymanson.blogspot.com/2008/12/benign-data-races-in-java.html
> >  > >  >>
> >  > >  >
> >  > >
> >  >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message