harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vijay Menon <...@google.com>
Subject Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)
Date Fri, 11 Dec 2009 15:57:03 GMT
BTW, as Jeremy says in the post Egor linked, the best fix is to read
this.hashCode only once:

  int hash = this.hashCode;
  if (hash == 0) {
    // calculate hash without accessing this.hashCode
    hashCode = hash;
  }
  return hash;

In this, there is exactly one read from hashCode.  The compiler is not
allowed to inject another one to, e.g., reread hash.

- Vijay

On Fri, Dec 11, 2009 at 7:50 AM, Vijay Menon <vsm@google.com> wrote:

> Egor,
>
> Thanks for the link!  I think I see the original poster's issue & I retract
> my statement.  The original example is basically identical to:
>
> tmp1 = this.hashCode
> if (tmp1 == 0) {
>   // calculate hash
>   this.hashCode = hash
> }
> tmp2 = this.hashCode
> return tmp2
>
>
> (You can think of tmp1 and tmp2 as internal registers introduced by the
> compiler.)
>
> The compiler is allowed to reorder the above to:
>
> tmp2 = this.hashCode
> tmp1 = this.hashCode
> if (tmp1 == 0) {
>   // calculate hash without accessing this.hashCode
>   this.hashCode = hash
>   tmp2 = this.hashCode
> }
> return tmp2
>
>
> This is just fine in a single-threaded setting.  If tmp1 is 0, tmp2 is
> always assigned the new hash.  If tmp1 is non-zero, then - in the
> single-threaded code - the earlier read of tmp2 would have been non-zero as
> well.
>
> In a multithreaded setting, you can get the following:
>
> tmp2 = this.hashCode  // reads zero
> // <--- another thread writes 42 to this.hashCode
> tmp1 = this.hashCode // reads 42, if clause not executed
> if (tmp1 == 0) {
>   // calculate hash without accessing this.hashCode
>   this.hashCode = hash
>   tmp2 = this.hashCode
> }
> return tmp2 // returns zero!
>
>
> Now, you get zero.  Basically, a compiler is allowed to reorder reads if it
> sees no (potentially aliasing) intervening writes.
>
> Egor: are you sure that DRLVM doesn't reorder reads to the heap?  StarJIT
> did do CSE of reads to the heap (do_memopt) - which is effectively the same
> thing.  Did DRLVM remove that from Jitrino?
>
> Vijay
>
> On Fri, Dec 11, 2009 at 6:32 AM, Egor Pasko <egor.pasko@gmail.com> wrote:
>
>> On the 0x684 day of Apache Harmony Tim Ellison wrote:
>> > On 11/Dec/2009 04:09, Vijay Menon wrote:
>> >> Perhaps I'm missing some context, but I don't see any problem.  I don't
>> >> believe that this:
>> >>
>> >>         if (hashCode == 0) {
>> >>             // calculate hash
>> >>             hashCode = hash;
>> >>         }
>> >>         return hashCode;
>> >>
>> >> can ever return 0 (assuming hash is non-zero), regardless of memory
>> fences.
>> >>  The JMM won't allow visible reordering in a single threaded program.
>> >
>> > I agree.  In the multi-threaded case there can be a data race on the
>> > hashCode, with the effect that the same hashCode value is unnecessarily,
>> > but harmlessly, recalculated.
>>
>> Vijay, Tim, you are not 100% correct here.
>>
>> 1. there should be an extra restriction that the part "calculate hash"
>>   does not touch the hashCode field. Without that restriction more
>>   trivial races can happen as discussed in LANG-481.
>>
>> So we should assume this code:
>>
>> if (this.hashCode == 0) {
>>  int hash;
>>  if (this.hashCode == 0) {
>>    // Calculate using 'hash' only, not this.hashCode.
>>    this.hashCode = hash;
>>  }
>>  return this.hashCode;
>> }
>>
>> where 'this.*' is always a memory reference while 'hash' is a thread
>> private var.
>>
>> 2. DRLVM JIT indeed does not privatize field access to threads. It
>>   always loads fields from memory in original order. Hence this
>>   potential bug does not affect DRLVM .. now. But potentially it can
>>   start optimizing things this way because current behavior prevents
>>   a bunch of high level optimizations.
>>
>> 3. Jeremy Manson, being an expert in Java Memory Model claims [1] that
>>   such reordering is allowed theoretically. I.e. like this:
>>
>> int hash = this.hashCode;
>> if (this.hashCode == 0) {
>>  hash = this.hashCode = // calculate hash
>> }
>> return hash;
>>
>> This is a correct single-threaded code. What happened here is a
>> lengthy thread privatization of this.hashCode (again, does not happen
>> in DRLVM). That is incorrect in multithreaded environment and needs
>> extra synchronization/volatile/etc.
>>
>> 4. I do not know why a JIT would want to do that, i.e. two sequential
>>   reads from the same memory location. Sounds like a bit synthetic
>> example.
>>
>>
>> [1]
>> http://jeremymanson.blogspot.com/2008/12/benign-data-races-in-java.html
>>
>> --
>> Egor Pasko
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message