harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rana Dasgupta" <rdasg...@gmail.com>
Subject Re: [drlvm][reliability tests] Harmony-2986, Dekker's algorithm -- is this a valid test for modern SMP hardware?
Date Thu, 01 Mar 2007 23:32:27 GMT
What is the purpose of the Dekker test in 2986? Is it intended to test
correct implementation of long volatile? Other than this, I am not sure why
the test exists.
If this test is to work, 64 bit volatile load/stores will have to be atomic,
and there is no workaround for Weldon's locked implementation of volatile
long on x86. sse2 supports 64 bit aligned moves, and effectively memory
moves that don't split cache lines should be atomic because they are a
single bus transaction. But x86 does not provide an atomicity guarantee on
the 64 bit moves. Worse, no write ordering guarantees are provided with sse2
instructions( eg movntps, movaps etc. ) which is a much bigger problem,
unless we want to start generating all the sfence, lfence etc. instructions
also.

Solving the volatile problem does not eliminate the weakness due of lack of
fences in Java.

Eg., the following is a perfectly reasonable usage

class SingleClass {
    volatile static SingleClass singleinst;
    public string val;
    public SingleClass() { val = "initial"; }

    public static SingleClass fetch() {
        if (singleinst == null) {    //  check instead of lock
            synchronized(SingleClass.class) {
                if (singleinst == null)      // another check for the race
                    singleinst = new SingleClass();
            }
              }
        return singleinst;
    }
};

In the common case, one does not need the lock since the singleton will be
usually initialized. But if the assignment to "val" passes the setting of
the singleton under stress test, one would get an uninitialized singleton.
This cannot happen on x86 because of its strong store ordering even on SMP,
but certainly can on IPF, alpha and other achitectures. But on x86, the fact
that the loads are not ordered and can pass stores will create the Dekker
like problems( using 32 bit volatiles ) on SMP.

I don't think Dekker/Peterson etc. algorithm implementations make much sense
in Java.  There are better jit tests for volatiles. The Linux kernel eg.,
uses dekker etc. heavily to implement critsecs, spin locks etc., but that's
a different type of usage, and Linux both uses fences heavily and offers its
own platform neutral fence calls.




On 2/28/07, Gregory Shimansky <gshimansky@gmail.com> wrote:
>
> On Wednesday 28 February 2007 23:28 Weldon Washburn wrote:
> > On 2/28/07, Gregory Shimansky <gshimansky@gmail.com> wrote:
> > > Weldon Washburn wrote:
> > > > On second thought, the only way I know to implement volatile long
> > >
> > > (64-bit)
> > >
> > > > Java variables on ia32 is:
> > > >
> > > > grab critical section
> > > > mov [ecx], low32bits;   // to do a write, the code for doing a read
> is
> > > > similar
> > > > mov[ecx+4], hi32bits;
> > > > release critical section
> > >
> > > Is it possible for 64-bit atomic load stores to use double load/stores
> >
> > hmm... can you tell us the specific instructions you are suggesting?  I
> see
> > quad loads/stores but can't find the double load/store version.  I also
> > tried to find the guarantees on bus transactions.  Somewhere I recall it
> is
> > documented that 4-byte aligned loads/stores are guaranteed to be atomic.
> > Maybe there are some new guarantees on 64-bit writes.  In any case, we
> > would still have to be compatible with existing Pentium III hardware and
> > probably have to go with some sort of critical section approach.
>
> Yes this is true. I hoped that someone would point out exactly if there
> are
> any 64-bit atomic operations that work with doubles. It seems like there
> aren't because the patch by Ivan in HARMONY-2092 has comments that it is
> enough to change GC and class loader to align objects on 64-bits boundary
> and
> that's enough for 64-bit load/stores but only with memory fence
> instructions
> in interpreter in addition.
>
> > > or SSE4 on the processors that have it?
> >
> > Good point.  I recall old versions were really only focused on
> multimedia.
> > And writing multimedia bits to memory is not sensitive to order or
> > atomicity.  In other words, if you are writing to a frame buffer, speed
> of
> > writes is important but the order the bits hit the buffer is
> not.  Again, I
> > looked but could not find the latest info SSE4 and atomicity.
>
> Actually it should have been SSE2. I pressed a wrong digit. I just meant
> quad
> load/stores when I wanted to mention it.
>
> > > Some observations:
> > > > 1)
> > > > Fixing the "volatile long" bug (Harmony-2092) by using critical
> section
> > >
> > > as
> > >
> > > > above should, as a side-effect, allow DekkerTest.java to run.
> > > > 2)
> > > > Using volatile long sort of, kind of defeats a major reason to use
> > >
> > > Dekker
> > >
> > > > algorithm in the first place.  Why bother if the performance is the
> > > > same
> > >
> > > as
> > >
> > > > using critical sections?
> > > > 3)
> > > > Using "volatile int" in DekkerTest.java probably still fails because
> > >
> > > reads
> > >
> > > > can pass writes.  One way to fix this might be to make the JIT emit
> r/w
> > > > memory fence whenever reading/writing the volatile int.  While
> memory
> > > > fences
> > > > are often cheaper than HW locks, they are not free.
> > > > 4)
> > > > My guess is that there are no old legacy Java apps that use Dekker
> > > > algorithm.  In other words, nobody is dependant on Dekker algorithm
> > > > working.  My guess is that they are, however, dependent on volatile
> > > > long and
> > > > volatile int working properly. (which has the side effect of making
> > >
> > > Dekker
> > >
> > > > algo work.)
> > > >
> > > > On 2/21/07, Weldon Washburn <weldonwjw@gmail.com> wrote:
> > > >> On 2/21/07, Gregory Shimansky <gshimansky@gmail.com> wrote:
> > > >> > On Wednesday 21 February 2007 21:47 Rana Dasgupta wrote:
> > > >> > > Weldon,
> > > >> > >   But I am not sure why the behavior would be different
from J9
> on
> > > >>
> > > >> the
> > > >>
> > > >> > same
> > > >> >
> > > >> > > hardware. Do we jit volatiles differently?
> > > >>
> > > >> The differences in behavior can be caused by lots of things that
> are
> > >
> > > not
> > >
> > > >> related to memory model.  For example the JIT might actually emit
> > >
> > > slighly
> > >
> > > >> different code.  Slighly different code can easily open/close race
> > > >> conditions.  The important concept is that both J9 and drlvm fail.
> > > >> And the
> > > >> failure appears to be because modern hardware is most likely not
> > > >> designed to
> > > >> run Dekker's algo without memory fences.
> > > >>
> > > >> There is a bug on DRLVM about volatile variables HARMONY-2092. It
> is
> > > >> about
> > > >>
> > > >> > long and double type variables assignments. Is it the same as
in
> > > >> > Dekker's
> > > >> > algorithm?
> > > >>
> > > >>  DekkerTest.java uses "long" variables.  Yes, this could change the
> > >
> > > rate
> > >
> > > >> of failure but not eliminate failures completely.
> > > >>
> > > >> > On 2/20/07, Weldon Washburn <weldonwjw@gmail.com> wrote:
> > > >> > > > It seems Dekker's algorithm is not expected to work
on SPARC
> or
> > > >>
> > > >> IA32
> > > >>
> > > >> > SMP
> > > >> >
> > > >> > > > boxes unless memory fences are used.  DekkerTest.java
in
> > > >> >
> > > >> > Harmony-2986
> > > >> >
> > > >> > > > does not contain memory fences.  The volatile keyword
> guarantees
> > > >>
> > > >> the
> > > >>
> > > >> > > > compiler will write a given variable to memory.  However,
the
> HW
> > > >>
> > > >> may
> > > >>
> > > >> > > > actually have a
> > > >> > > > write buffer and allow reads to pass writes.  As far
as I
> know,
> > >
> > > the
> > >
> > > >> > Java
> > > >> >
> > > >> > > > language does not provide a means to invoke a memory
fence.
> > > >> > > > Thus
> > > >> >
> > > >> > there
> > > >> >
> > > >> > > > is no way to fix up DekkerTest.java.  I may be
> misunderstanding
> > > >> >
> > > >> > something
> > > >> >
> > > >> > > > here.  Does anyone have comment?
> > > >> > > >
> > > >> > > > An excellent description of the issues involved is
in a David
> > >
> > > Dice
> > >
> > > >> > > > presentation at:
> > > >> > > >
> > > >> > > >
> http://blogs.sun.com/dave/resource/synchronization-public2.pdf
> > > >> > > >
> > > >> > > > --
> > > >> > > > Weldon Washburn
> > > >> > > > Intel Enterprise Solutions Software Division
> > > >> >
> > > >> > --
> > > >> > Gregory
> > >
> > > --
> > > Gregory
>
> --
> Gregory
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message