harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Beyer <ndbe...@apache.org>
Subject Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)
Date Sat, 12 Dec 2009 04:00:21 GMT
On Fri, Dec 11, 2009 at 10:04 AM, Tim Ellison <t.p.ellison@gmail.com> wrote:
> On 11/Dec/2009 14:32, Egor Pasko wrote:
>> On the 0x684 day of Apache Harmony Tim Ellison wrote:
>>> On 11/Dec/2009 04:09, Vijay Menon wrote:
>>>> Perhaps I'm missing some context, but I don't see any problem.  I don't
>>>> believe that this:
>>>>
>>>>         if (hashCode == 0) {
>>>>             // calculate hash
>>>>             hashCode = hash;
>>>>         }
>>>>         return hashCode;
>>>>
>>>> can ever return 0 (assuming hash is non-zero), regardless of memory fences.
>>>>  The JMM won't allow visible reordering in a single threaded program.
>>> I agree.  In the multi-threaded case there can be a data race on the
>>> hashCode, with the effect that the same hashCode value is unnecessarily,
>>> but harmlessly, recalculated.
>>
>> Vijay, Tim, you are not 100% correct here.
>>
>> 1. there should be an extra restriction that the part "calculate hash"
>>    does not touch the hashCode field. Without that restriction more
>>    trivial races can happen as discussed in LANG-481.
>>
>> So we should assume this code:
>>
>> if (this.hashCode == 0) {
>>   int hash;
>>   if (this.hashCode == 0) {
>>     // Calculate using 'hash' only, not this.hashCode.
>>     this.hashCode = hash;
>>   }
>>   return this.hashCode;
>> }
>
> Yes, I guess I figured that was assumed :-)
>
> Of course, there are lots of things you could put into the
> "// Calculate..." section that would be unsafe.  We should stick with
> showing the non-abbreviated implementation to avoid ambiguity:
>
>    public int hashCode() {
>        if (hashCode == 0) {
>            if (count == 0) {
>                return 0;
>            }
>            int hash = 0;
>            for (int i = offset; i < count + offset; i++) {
>                hash = value[i] + ((hash << 5) - hash);
>            }
>            hashCode = hash;
>        }
>        return hashCode;
>    }
>

I think I understand the concern, after some additional reading. The
issue seems to be that 'hashCode' is read twice and the field is not
protected by any memory barriers (synchronized, volatile, etc). As
such, it would be okay for the second read to be done using a cached
value, which means that both reads could return 0 in the same thread
of execution. Another way to look at it is that the write to
'hashCode' may or may not affect subsequent reads of 'hashCode'. This
is how I understand it.

Will that happen in practice? I have no idea. It does seem possible.

In any case, it does seem a pinch more efficient to only do one read
of hashCode ... switch up the code to be something like this.

public int hashCode() {
    final int hash = hashCode;
    if (hash == 0) {
        if (count == 0) {
            return 0;
        }
        for (int i = offset; i < count + offset; i++) {
            hash = value[i] + ((hash << 5) - hash);
        }
        hashCode = hash;
    }
    return hash;
}

Thoughts?

>> where 'this.*' is always a memory reference while 'hash' is a thread private var.
>>
>> 2. DRLVM JIT indeed does not privatize field access to threads. It
>>    always loads fields from memory in original order. Hence this
>>    potential bug does not affect DRLVM .. now. But potentially it can
>>    start optimizing things this way because current behavior prevents
>>    a bunch of high level optimizations.
>>
>> 3. Jeremy Manson, being an expert in Java Memory Model claims [1] that
>>    such reordering is allowed theoretically. I.e. like this:
>>
>> int hash = this.hashCode;
>> if (this.hashCode == 0) {
>>   hash = this.hashCode = // calculate hash
>> }
>> return hash;
>>
>> This is a correct single-threaded code. What happened here is a
>> lengthy thread privatization of this.hashCode (again, does not happen
>> in DRLVM). That is incorrect in multithreaded environment and needs
>> extra synchronization/volatile/etc.
>>
>> 4. I do not know why a JIT would want to do that, i.e. two sequential
>>    reads from the same memory location. Sounds like a bit synthetic example.
>
> ...at which point a bunch of code probably would go wrong!  So hopefully
> it remains only a theoretical possibility.
>
> Regards,
> Tim
>
>> [1] http://jeremymanson.blogspot.com/2008/12/benign-data-races-in-java.html
>>
>

Mime
View raw message