harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Ellison <t.p.elli...@gmail.com>
Subject Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)
Date Fri, 11 Dec 2009 16:04:16 GMT
On 11/Dec/2009 14:32, Egor Pasko wrote:
> On the 0x684 day of Apache Harmony Tim Ellison wrote:
>> On 11/Dec/2009 04:09, Vijay Menon wrote:
>>> Perhaps I'm missing some context, but I don't see any problem.  I don't
>>> believe that this:
>>>
>>>         if (hashCode == 0) {
>>>             // calculate hash
>>>             hashCode = hash;
>>>         }
>>>         return hashCode;
>>>
>>> can ever return 0 (assuming hash is non-zero), regardless of memory fences.
>>>  The JMM won't allow visible reordering in a single threaded program.
>> I agree.  In the multi-threaded case there can be a data race on the
>> hashCode, with the effect that the same hashCode value is unnecessarily,
>> but harmlessly, recalculated.
> 
> Vijay, Tim, you are not 100% correct here.
> 
> 1. there should be an extra restriction that the part "calculate hash"
>    does not touch the hashCode field. Without that restriction more
>    trivial races can happen as discussed in LANG-481.
> 
> So we should assume this code:
> 
> if (this.hashCode == 0) {
>   int hash;
>   if (this.hashCode == 0) {
>     // Calculate using 'hash' only, not this.hashCode.
>     this.hashCode = hash;
>   }
>   return this.hashCode;
> }

Yes, I guess I figured that was assumed :-)

Of course, there are lots of things you could put into the
"// Calculate..." section that would be unsafe.  We should stick with
showing the non-abbreviated implementation to avoid ambiguity:

    public int hashCode() {
        if (hashCode == 0) {
            if (count == 0) {
                return 0;
            }
            int hash = 0;
            for (int i = offset; i < count + offset; i++) {
                hash = value[i] + ((hash << 5) - hash);
            }
            hashCode = hash;
        }
        return hashCode;
    }

> where 'this.*' is always a memory reference while 'hash' is a thread private var.
> 
> 2. DRLVM JIT indeed does not privatize field access to threads. It
>    always loads fields from memory in original order. Hence this
>    potential bug does not affect DRLVM .. now. But potentially it can
>    start optimizing things this way because current behavior prevents
>    a bunch of high level optimizations.
> 
> 3. Jeremy Manson, being an expert in Java Memory Model claims [1] that
>    such reordering is allowed theoretically. I.e. like this:
> 
> int hash = this.hashCode;
> if (this.hashCode == 0) {
>   hash = this.hashCode = // calculate hash
> }
> return hash;
> 
> This is a correct single-threaded code. What happened here is a
> lengthy thread privatization of this.hashCode (again, does not happen
> in DRLVM). That is incorrect in multithreaded environment and needs
> extra synchronization/volatile/etc.
> 
> 4. I do not know why a JIT would want to do that, i.e. two sequential
>    reads from the same memory location. Sounds like a bit synthetic example.

...at which point a bunch of code probably would go wrong!  So hopefully
it remains only a theoretical possibility.

Regards,
Tim

> [1] http://jeremymanson.blogspot.com/2008/12/benign-data-races-in-java.html
> 

Mime
View raw message