harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Egor Pasko <egor.pa...@gmail.com>
Subject Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)
Date Mon, 14 Dec 2009 07:39:16 GMT
On the 0x686 day of Apache Harmony Nathan Beyer wrote:
> On Sat, Dec 12, 2009 at 11:04 AM, sebb <sebbaz@gmail.com> wrote:
>> On 12/12/2009, Vijay Menon <vsm@google.com> wrote:
>>> On Sat, Dec 12, 2009 at 7:34 AM, sebb <sebbaz@gmail.com> wrote:
>>>
>>>  > On 12/12/2009, Nathan Beyer <ndbeyer@apache.org> wrote:
>>>  > > On Fri, Dec 11, 2009 at 10:04 AM, Tim Ellison <t.p.ellison@gmail.com>
>>>  > wrote:
>>>  > >  > On 11/Dec/2009 14:32, Egor Pasko wrote:
>>>  > >  >> On the 0x684 day of Apache Harmony Tim Ellison wrote:
>>>  > >  >>> On 11/Dec/2009 04:09, Vijay Menon wrote:
>>>  > >  >>>> Perhaps I'm missing some context, but I don't
see any problem.  I
>>>  > don't
>>>  > >  >>>> believe that this:
>>>  > >  >>>>
>>>  > >  >>>>         if (hashCode == 0) {
>>>  > >  >>>>             // calculate hash
>>>  > >  >>>>             hashCode = hash;
>>>  > >  >>>>         }
>>>  > >  >>>>         return hashCode;
>>>  > >  >>>>
>>>  > >  >>>> can ever return 0 (assuming hash is non-zero),
regardless of memory
>>>  > fences.
>>>  > >  >>>>  The JMM won't allow visible reordering in a
single threaded
>>>  > program.
>>>  > >  >>> I agree.  In the multi-threaded case there can be
a data race on the
>>>  > >  >>> hashCode, with the effect that the same hashCode value
is
>>>  > unnecessarily,
>>>  > >  >>> but harmlessly, recalculated.
>>>  > >  >>
>>>  > >  >> Vijay, Tim, you are not 100% correct here.
>>>  > >  >>
>>>  > >  >> 1. there should be an extra restriction that the part
"calculate
>>>  > hash"
>>>  > >  >>    does not touch the hashCode field. Without that restriction
more
>>>  > >  >>    trivial races can happen as discussed in LANG-481.
>>>  > >  >>
>>>  > >  >> So we should assume this code:
>>>  > >  >>
>>>  > >  >> if (this.hashCode == 0) {
>>>  > >  >>   int hash;
>>>  > >  >>   if (this.hashCode == 0) {
>>>  > >  >>     // Calculate using 'hash' only, not this.hashCode.
>>>  > >  >>     this.hashCode = hash;
>>>  > >  >>   }
>>>  > >  >>   return this.hashCode;
>>>  > >  >> }
>>>  > >  >
>>>  > >  > Yes, I guess I figured that was assumed :-)
>>>  > >  >
>>>  > >  > Of course, there are lots of things you could put into the
>>>  > >  > "// Calculate..." section that would be unsafe.  We should
stick with
>>>  > >  > showing the non-abbreviated implementation to avoid ambiguity:
>>>  > >  >
>>>  > >  >    public int hashCode() {
>>>  > >  >        if (hashCode == 0) {
>>>  > >  >            if (count == 0) {
>>>  > >  >                return 0;
>>>  > >  >            }
>>>  > >  >            int hash = 0;
>>>  > >  >            for (int i = offset; i < count + offset;
i++) {
>>>  > >  >                hash = value[i] + ((hash << 5)
- hash);
>>>  > >  >            }
>>>  > >  >            hashCode = hash;
>>>  > >  >        }
>>>  > >  >        return hashCode;
>>>  > >  >    }
>>>  > >  >
>>>  > >
>>>  > >
>>>  > > I think I understand the concern, after some additional reading.
The
>>>  > >  issue seems to be that 'hashCode' is read twice and the field is
not
>>>  > >  protected by any memory barriers (synchronized, volatile, etc).
As
>>>  > >  such, it would be okay for the second read to be done using a cached
>>>  > >  value, which means that both reads could return 0 in the same thread
>>>  > >  of execution. Another way to look at it is that the write to
>>>  > >  'hashCode' may or may not affect subsequent reads of 'hashCode'.
This
>>>  > >  is how I understand it.
>>>  > >
>>>  > >  Will that happen in practice? I have no idea. It does seem possible.
>>>  >
>>>  > The Java MM guarantees that a single thread behaves as if the code is
>>>  > processed sequentially.
>>>  > So if the thread writes non-zero to this.hashCode it cannot then
>>>  > return zero for the value of this.hashCode if no other threads
>>>  > intervene. The thread cannot ignore updates to values it has itself
>>>  > cached!
>>>  >
>>>  > If another thread writes to this.hashCode concurrently, then this
>>>  > thread may or may not see the value stored by that thread. In this
>>>  > case, it's not a problem, as another thread can only write a fixed
>>>  > value. So the worst that can happen is that this.hashCode is written
>>>  > more than once, and the current thread may fetch the value written by
>>>  > another thread. But this is the same value it wrote anyway.
>>>  >
>>>
>>>
>>> In a multithreaded setting, this code *can* break and return 0 if hashCode
>>>  is read twice.  This is not just a performance optimization - it is a
>>>  correctness issue.  The compiler / runtime / hardware is allowed to reorder
>>>  read operations.  The following racy execution is allowable under the JMM:
>>>
>>>  1. Thread 1 reads 0 from hashCode and stores 0 into a local (t1).
>>>  2. Thread 2 write 42 into hashCode.
>>>  3. Thread 1 reads 42 from hashCode and stores 42 into a local (t2).
>>>  4. Thread 1 compares t2 (42) with 0 and does not execute the if clause.
>>>  5. Thread 1 returns t1 (0).
>>>
>>
>> But why would Thread 1 read hashCode twice before the comparison?
>>
>> Seems to me that would break the "as if serial" guarantee for a single thread.
>> In the code sequence, the comparison is before the return, and
>> therefore "happens-before" the return. I.e. step 3 "happens-before"
>> step 1+5.
>>
>> I'm not saying Harmony should keep the current code - the suggested
>> temp variable version seems better anyway - just trying to understand
>> what (if anything) is currently broken.
>
> I'm still open to counter arguments as it still seems weird. I keep
> focusing on the bit that 'this.hashCode' is referenced twice as a
> 'read' - "if (hashCode == 0)" and "return hashCode". Since 'hashCode'
> isn't final, volatile or in a syncrhonized region, the read into the
> stack could be cached. As I understand it, it's not the reorder of the
> Java code, it's the reorder of the generated code that just reads the
> value from the heap to the stack.

Nathan,

you are right, the value can be cached and if id does, correctness
does not suffer. In fact the whole case of two consequent reads is
rather synthetic: it is just too easy for JIT to eliminate this double
memory load of the same memory location. So, most likely JIT would not
let this reordering happen.

Anyway, it is legal for JIT to perform such optimization
(i.e. privatize some fields to the stack, not privatize others,
reorder loads, add stores in a correct single-threaded way) in case
there are no synchronization, volatile memory locations, virtual
calls, etc.

> I'm basing this off of the 'racy single-check' idiom that's mentioned
> in Effective Java. I'd like to get a complete answer to this though.
>
> -Nathan

-- 
Egor Pasko


Mime
View raw message