harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Egor Pasko <egor.pa...@gmail.com>
Subject Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)
Date Sat, 12 Dec 2009 10:54:25 GMT
On the 0x685 day of Apache Harmony Nathan Beyer wrote:
> On Fri, Dec 11, 2009 at 10:04 AM, Tim Ellison <t.p.ellison@gmail.com> wrote:
>> On 11/Dec/2009 14:32, Egor Pasko wrote:
>>> On the 0x684 day of Apache Harmony Tim Ellison wrote:
>>>> On 11/Dec/2009 04:09, Vijay Menon wrote:
>>>>> Perhaps I'm missing some context, but I don't see any problem.  I don't
>>>>> believe that this:
>>>>>
>>>>>         if (hashCode == 0) {
>>>>>             // calculate hash
>>>>>             hashCode = hash;
>>>>>         }
>>>>>         return hashCode;
>>>>>
>>>>> can ever return 0 (assuming hash is non-zero), regardless of memory fences.
>>>>>  The JMM won't allow visible reordering in a single threaded program.
>>>> I agree.  In the multi-threaded case there can be a data race on the
>>>> hashCode, with the effect that the same hashCode value is unnecessarily,
>>>> but harmlessly, recalculated.
>>>
>>> Vijay, Tim, you are not 100% correct here.
>>>
>>> 1. there should be an extra restriction that the part "calculate hash"
>>>    does not touch the hashCode field. Without that restriction more
>>>    trivial races can happen as discussed in LANG-481.
>>>
>>> So we should assume this code:
>>>
>>> if (this.hashCode == 0) {
>>>   int hash;
>>>   if (this.hashCode == 0) {
>>>     // Calculate using 'hash' only, not this.hashCode.
>>>     this.hashCode = hash;
>>>   }
>>>   return this.hashCode;
>>> }
>>
>> Yes, I guess I figured that was assumed :-)
>>
>> Of course, there are lots of things you could put into the
>> "// Calculate..." section that would be unsafe.  We should stick with
>> showing the non-abbreviated implementation to avoid ambiguity:
>>
>>    public int hashCode() {
>>        if (hashCode == 0) {
>>            if (count == 0) {
>>                return 0;
>>            }
>>            int hash = 0;
>>            for (int i = offset; i < count + offset; i++) {
>>                hash = value[i] + ((hash << 5) - hash);
>>            }
>>            hashCode = hash;
>>        }
>>        return hashCode;
>>    }
>>
>
> I think I understand the concern, after some additional reading. The
> issue seems to be that 'hashCode' is read twice and the field is not
> protected by any memory barriers (synchronized, volatile, etc). As
> such, it would be okay for the second read to be done using a cached
> value, which means that both reads could return 0 in the same thread
> of execution. Another way to look at it is that the write to
> 'hashCode' may or may not affect subsequent reads of 'hashCode'. This
> is how I understand it.
>
> Will that happen in practice? I have no idea. It does seem possible.
>
> In any case, it does seem a pinch more efficient to only do one read
> of hashCode ... switch up the code to be something like this.
>
> public int hashCode() {
>     final int hash = hashCode;
>     if (hash == 0) {
>         if (count == 0) {
>             return 0;
>         }
>         for (int i = offset; i < count + offset; i++) {
>             hash = value[i] + ((hash << 5) - hash);
>         }
>         hashCode = hash;

one more 'return hash' here, please :)

>     }
>     return hash;
> }
>
> Thoughts?

-- 
Egor Pasko


Mime
View raw message