harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sebb <seb...@gmail.com>
Subject Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)
Date Sat, 12 Dec 2009 15:34:54 GMT
On 12/12/2009, Nathan Beyer <ndbeyer@apache.org> wrote:
> On Fri, Dec 11, 2009 at 10:04 AM, Tim Ellison <t.p.ellison@gmail.com> wrote:
>  > On 11/Dec/2009 14:32, Egor Pasko wrote:
>  >> On the 0x684 day of Apache Harmony Tim Ellison wrote:
>  >>> On 11/Dec/2009 04:09, Vijay Menon wrote:
>  >>>> Perhaps I'm missing some context, but I don't see any problem.  I don't
>  >>>> believe that this:
>  >>>>
>  >>>>         if (hashCode == 0) {
>  >>>>             // calculate hash
>  >>>>             hashCode = hash;
>  >>>>         }
>  >>>>         return hashCode;
>  >>>>
>  >>>> can ever return 0 (assuming hash is non-zero), regardless of memory
fences.
>  >>>>  The JMM won't allow visible reordering in a single threaded program.
>  >>> I agree.  In the multi-threaded case there can be a data race on the
>  >>> hashCode, with the effect that the same hashCode value is unnecessarily,
>  >>> but harmlessly, recalculated.
>  >>
>  >> Vijay, Tim, you are not 100% correct here.
>  >>
>  >> 1. there should be an extra restriction that the part "calculate hash"
>  >>    does not touch the hashCode field. Without that restriction more
>  >>    trivial races can happen as discussed in LANG-481.
>  >>
>  >> So we should assume this code:
>  >>
>  >> if (this.hashCode == 0) {
>  >>   int hash;
>  >>   if (this.hashCode == 0) {
>  >>     // Calculate using 'hash' only, not this.hashCode.
>  >>     this.hashCode = hash;
>  >>   }
>  >>   return this.hashCode;
>  >> }
>  >
>  > Yes, I guess I figured that was assumed :-)
>  >
>  > Of course, there are lots of things you could put into the
>  > "// Calculate..." section that would be unsafe.  We should stick with
>  > showing the non-abbreviated implementation to avoid ambiguity:
>  >
>  >    public int hashCode() {
>  >        if (hashCode == 0) {
>  >            if (count == 0) {
>  >                return 0;
>  >            }
>  >            int hash = 0;
>  >            for (int i = offset; i < count + offset; i++) {
>  >                hash = value[i] + ((hash << 5) - hash);
>  >            }
>  >            hashCode = hash;
>  >        }
>  >        return hashCode;
>  >    }
>  >
>
>
> I think I understand the concern, after some additional reading. The
>  issue seems to be that 'hashCode' is read twice and the field is not
>  protected by any memory barriers (synchronized, volatile, etc). As
>  such, it would be okay for the second read to be done using a cached
>  value, which means that both reads could return 0 in the same thread
>  of execution. Another way to look at it is that the write to
>  'hashCode' may or may not affect subsequent reads of 'hashCode'. This
>  is how I understand it.
>
>  Will that happen in practice? I have no idea. It does seem possible.

The Java MM guarantees that a single thread behaves as if the code is
processed sequentially.
So if the thread writes non-zero to this.hashCode it cannot then
return zero for the value of this.hashCode if no other threads
intervene. The thread cannot ignore updates to values it has itself
cached!

If another thread writes to this.hashCode concurrently, then this
thread may or may not see the value stored by that thread. In this
case, it's not a problem, as another thread can only write a fixed
value. So the worst that can happen is that this.hashCode is written
more than once, and the current thread may fetch the value written by
another thread. But this is the same value it wrote anyway.

>  In any case, it does seem a pinch more efficient to only do one read
>  of hashCode ... switch up the code to be something like this.

Agreed.

>  public int hashCode() {
>     final int hash = hashCode;
>     if (hash == 0) {
>
>         if (count == 0) {
>             return 0;
>         }
>
>         for (int i = offset; i < count + offset; i++) {
>             hash = value[i] + ((hash << 5) - hash);
>         }
>         hashCode = hash;
>     }
>
>     return hash;
>  }
>
>  Thoughts?
>
>
>  >> where 'this.*' is always a memory reference while 'hash' is a thread private
var.
>  >>
>  >> 2. DRLVM JIT indeed does not privatize field access to threads. It
>  >>    always loads fields from memory in original order. Hence this
>  >>    potential bug does not affect DRLVM .. now. But potentially it can
>  >>    start optimizing things this way because current behavior prevents
>  >>    a bunch of high level optimizations.
>  >>
>  >> 3. Jeremy Manson, being an expert in Java Memory Model claims [1] that
>  >>    such reordering is allowed theoretically. I.e. like this:
>  >>
>  >> int hash = this.hashCode;
>  >> if (this.hashCode == 0) {
>  >>   hash = this.hashCode = // calculate hash
>  >> }
>  >> return hash;
>  >>
>  >> This is a correct single-threaded code. What happened here is a
>  >> lengthy thread privatization of this.hashCode (again, does not happen
>  >> in DRLVM). That is incorrect in multithreaded environment and needs
>  >> extra synchronization/volatile/etc.
>  >>
>  >> 4. I do not know why a JIT would want to do that, i.e. two sequential
>  >>    reads from the same memory location. Sounds like a bit synthetic example.
>  >
>  > ...at which point a bunch of code probably would go wrong!  So hopefully
>  > it remains only a theoretical possibility.
>  >
>  > Regards,
>  > Tim
>  >
>  >> [1] http://jeremymanson.blogspot.com/2008/12/benign-data-races-in-java.html
>  >>
>  >
>

Mime
View raw message