harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vijay Menon <...@google.com>
Subject Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)
Date Sat, 12 Dec 2009 15:53:10 GMT
On Sat, Dec 12, 2009 at 7:34 AM, sebb <sebbaz@gmail.com> wrote:

> On 12/12/2009, Nathan Beyer <ndbeyer@apache.org> wrote:
> > On Fri, Dec 11, 2009 at 10:04 AM, Tim Ellison <t.p.ellison@gmail.com>
> wrote:
> >  > On 11/Dec/2009 14:32, Egor Pasko wrote:
> >  >> On the 0x684 day of Apache Harmony Tim Ellison wrote:
> >  >>> On 11/Dec/2009 04:09, Vijay Menon wrote:
> >  >>>> Perhaps I'm missing some context, but I don't see any problem.
 I
> don't
> >  >>>> believe that this:
> >  >>>>
> >  >>>>         if (hashCode == 0) {
> >  >>>>             // calculate hash
> >  >>>>             hashCode = hash;
> >  >>>>         }
> >  >>>>         return hashCode;
> >  >>>>
> >  >>>> can ever return 0 (assuming hash is non-zero), regardless of memory
> fences.
> >  >>>>  The JMM won't allow visible reordering in a single threaded
> program.
> >  >>> I agree.  In the multi-threaded case there can be a data race on the
> >  >>> hashCode, with the effect that the same hashCode value is
> unnecessarily,
> >  >>> but harmlessly, recalculated.
> >  >>
> >  >> Vijay, Tim, you are not 100% correct here.
> >  >>
> >  >> 1. there should be an extra restriction that the part "calculate
> hash"
> >  >>    does not touch the hashCode field. Without that restriction more
> >  >>    trivial races can happen as discussed in LANG-481.
> >  >>
> >  >> So we should assume this code:
> >  >>
> >  >> if (this.hashCode == 0) {
> >  >>   int hash;
> >  >>   if (this.hashCode == 0) {
> >  >>     // Calculate using 'hash' only, not this.hashCode.
> >  >>     this.hashCode = hash;
> >  >>   }
> >  >>   return this.hashCode;
> >  >> }
> >  >
> >  > Yes, I guess I figured that was assumed :-)
> >  >
> >  > Of course, there are lots of things you could put into the
> >  > "// Calculate..." section that would be unsafe.  We should stick with
> >  > showing the non-abbreviated implementation to avoid ambiguity:
> >  >
> >  >    public int hashCode() {
> >  >        if (hashCode == 0) {
> >  >            if (count == 0) {
> >  >                return 0;
> >  >            }
> >  >            int hash = 0;
> >  >            for (int i = offset; i < count + offset; i++) {
> >  >                hash = value[i] + ((hash << 5) - hash);
> >  >            }
> >  >            hashCode = hash;
> >  >        }
> >  >        return hashCode;
> >  >    }
> >  >
> >
> >
> > I think I understand the concern, after some additional reading. The
> >  issue seems to be that 'hashCode' is read twice and the field is not
> >  protected by any memory barriers (synchronized, volatile, etc). As
> >  such, it would be okay for the second read to be done using a cached
> >  value, which means that both reads could return 0 in the same thread
> >  of execution. Another way to look at it is that the write to
> >  'hashCode' may or may not affect subsequent reads of 'hashCode'. This
> >  is how I understand it.
> >
> >  Will that happen in practice? I have no idea. It does seem possible.
>
> The Java MM guarantees that a single thread behaves as if the code is
> processed sequentially.
> So if the thread writes non-zero to this.hashCode it cannot then
> return zero for the value of this.hashCode if no other threads
> intervene. The thread cannot ignore updates to values it has itself
> cached!
>
> If another thread writes to this.hashCode concurrently, then this
> thread may or may not see the value stored by that thread. In this
> case, it's not a problem, as another thread can only write a fixed
> value. So the worst that can happen is that this.hashCode is written
> more than once, and the current thread may fetch the value written by
> another thread. But this is the same value it wrote anyway.
>

In a multithreaded setting, this code *can* break and return 0 if hashCode
is read twice.  This is not just a performance optimization - it is a
correctness issue.  The compiler / runtime / hardware is allowed to reorder
read operations.  The following racy execution is allowable under the JMM:

1. Thread 1 reads 0 from hashCode and stores 0 into a local (t1).
2. Thread 2 write 42 into hashCode.
3. Thread 1 reads 42 from hashCode and stores 42 into a local (t2).
4. Thread 1 compares t2 (42) with 0 and does not execute the if clause.
5. Thread 1 returns t1 (0).

- Vijay


>
> >  In any case, it does seem a pinch more efficient to only do one read
> >  of hashCode ... switch up the code to be something like this.
>
> Agreed.
>
> >  public int hashCode() {
> >     final int hash = hashCode;
> >     if (hash == 0) {
> >
> >         if (count == 0) {
> >             return 0;
> >         }
> >
> >         for (int i = offset; i < count + offset; i++) {
> >             hash = value[i] + ((hash << 5) - hash);
> >         }
> >         hashCode = hash;
> >     }
> >
> >     return hash;
> >  }
> >
> >  Thoughts?
> >
> >
> >  >> where 'this.*' is always a memory reference while 'hash' is a thread
> private var.
> >  >>
> >  >> 2. DRLVM JIT indeed does not privatize field access to threads. It
> >  >>    always loads fields from memory in original order. Hence this
> >  >>    potential bug does not affect DRLVM .. now. But potentially it can
> >  >>    start optimizing things this way because current behavior prevents
> >  >>    a bunch of high level optimizations.
> >  >>
> >  >> 3. Jeremy Manson, being an expert in Java Memory Model claims [1]
> that
> >  >>    such reordering is allowed theoretically. I.e. like this:
> >  >>
> >  >> int hash = this.hashCode;
> >  >> if (this.hashCode == 0) {
> >  >>   hash = this.hashCode = // calculate hash
> >  >> }
> >  >> return hash;
> >  >>
> >  >> This is a correct single-threaded code. What happened here is a
> >  >> lengthy thread privatization of this.hashCode (again, does not happen
> >  >> in DRLVM). That is incorrect in multithreaded environment and needs
> >  >> extra synchronization/volatile/etc.
> >  >>
> >  >> 4. I do not know why a JIT would want to do that, i.e. two sequential
> >  >>    reads from the same memory location. Sounds like a bit synthetic
> example.
> >  >
> >  > ...at which point a bunch of code probably would go wrong!  So
> hopefully
> >  > it remains only a theoretical possibility.
> >  >
> >  > Regards,
> >  > Tim
> >  >
> >  >> [1]
> http://jeremymanson.blogspot.com/2008/12/benign-data-races-in-java.html
> >  >>
> >  >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message