Mailing-List: contact dev-help@harmony.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@harmony.apache.org
Received-SPF: pass (herse.apache.org: domain of mike.fursov@gmail.com
 designates 64.233.166.183 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=beta;
        h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references;
        b=hd1+41oR/HgVWCGpI4/++zoJy7gMTaGM+bD90MWF/LUlJWNbiIaomwtDGDZujlG+5RNNgUKz+PiIIZQ4Wi2vhD47/K+7cUeTOeOP+xj+5sa0DBI+fjbz4TQYQfDNdj7nukyUDkc/Eo+QbNUU5AXg/mPFWJWW5X2ABsF60/mXa5I=
Message-ID: <bc79dd600706010149i6a1c9742h11d6908c2d2eeec5@mail.gmail.com>
Date: Fri, 1 Jun 2007 15:49:28 +0700
From: "Mikhail Fursov" <mike.fursov@gmail.com>
To: dev@harmony.apache.org
Subject: Re: [jira] Updated: (HARMONY-2092) [drlvm][jit] Harmony works with
 volatile variables incorrectly
In-Reply-To: <469bff730705312316p17551849ob1dcd58b6105f519@mail.gmail.com>
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_Part_2620_13173052.1180687768435"
References: <17006127.1179913816525.JavaMail.jira@brutus>
	 <9623c9a50705230421ka79d039s1618eaf48da090a6@mail.gmail.com>
	 <f3itbg$kln$1@sea.gmane.org>
	 <9623c9a50705300120nb8a847bp935f4973e019ee9b@mail.gmail.com>
	 <0vq1wgyhw5r.fsf@gmail.com>
	 <4dd1f3f00705311501k6a502d1dk4719499ba360362d@mail.gmail.com>
	 <469bff730705312316p17551849ob1dcd58b6105f519@mail.gmail.com>

------=_Part_2620_13173052.1180687768435
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

Will we plan making objects aligned by 8-bytes in Q3?
AFAIU this is the only way to avoid lock prefix and performance degradation
and does not require big changes in GC: we need to have objects have size of
multiple of 8 and every memory area allocated by GC to be aligned by 8. Do I
miss something here?

It can be less work then making temporary workarounds in JIT instead of
simple XMM moves we already have.

On 6/1/07, Pavel Ozhdikhin <pavel.ozhdikhin@gmail.com> wrote:
>
> On 6/1/07, Weldon Washburn <weldonwjw@gmail.com> wrote:
> > On 31 May 2007 00:52:00 +0400, Egor Pasko <egor.pasko@gmail.com> wrote:
> > >
> > > On the 0x2E6 day of Apache Harmony Xiao-Feng Li wrote:
> > > > On 5/30/07, George Timoshenko <george.timoshenko@gmail.com> wrote:
> > > > >
> > > > > > I had a question in the JIRA about this issue: why don't we use
> > > "lock"
> > > > > > prefix for the atomic access?
> > > > >
> > > > > well...
> > > > >
> > > > > Originally we split all 64-bit memory access into 2 ones of
> 32-bit.
> > > > > It does not have sense to set #LOCK prefix for them. (there is a
> gap
> > > > > between)
> > > > >
> > > > > We can only set #LOCK to some instruction that reads/writes whole
> 64
> > > bits.
> > > > >
> > > > > The bad thing is the only instruction (according to IA32 spec) we
> can
> > > > > set #LOCK to is CMPXCHG8B (MOVQ, MOVSD and any others can not be
> used
> > > > > with #LOCK)
> > > > >
> > > > > This monster (CMPXCHG8B) requires 4 registers:
> > > > >
> > > > > EAX
> > > > > EBX
> > > > > ECX
> > > > > EDX
> > > > >
> > > > > and (FLAGS) also.
> > > > >
> > > > > I am not sure CMPXCHG8B usage will be faster than making volatile
> > > fields
> > > > >    always synchronized (artificially)
> > > >
> > > > George, I believe it should be much faster than synchronized block,
> > > > since it is non-blocking with contended locks. To use compxchg, you
> > > > need a loop to check the return result till it succeeds. With
> > > > synchronized block, the thread will go to sleep till being waken up
> by
> > > > the releasing thread.
> > >
> > > hm, if I am not mistaken most of the time that would be a spin lock
> > > with the current thread manager. So, I cannot not bet which way is
> > > faster. Maybe, some expert in TM can tell for sure?
> >
> >
> > This kind of stuff is always emprical. The task is to build, measure,
> post
> > the results.  The wild cards are the workload and the
> hardware.  Different
> > combos will lead to different conclusions.
> >
> > Having said the above, my hunch is to go with CMPXCHG8B for right
> now.  The
> > main motivation is that this decouples register assignment from the jvm
> > thread subsystem thus makes things easier to debug.  This is
> goodness.  Also
> > running exhaustive studies of different workloads, different platforms
> is
> > not something of high value for a JVM at such an early stage of
> > development.  In other words, do this analysis once we get real
> workloads
> > like specjappserver running.  As already noted, it should be easy to
> > re-implement when the time is right.
> >
> > Interesting background material --- From Jeremy Manson's "The Java
> Memory
> > Model", POPL 2005, section 2.3 it says, "In order to allow for
> non-blocking
> > techniques that communicate between threads, we also want to allow the
> use
> > of _volatile_ variables to synchronize information between threads.  The
> > properties of volatile variables arose from the need to provide a way to
> > communicate between threads without the overhead of ensuring mutual
> > exclusion."  While this does not dictate a solution, it sort of suggests
> > using opcodes (lockxxx) instead of bytecodes (monenter/exit).
>
> Adding monenter/monexit pair in the place where the author of the code
> did not intended to put them may lead to deadlock. So, I'm +1 for
> prototyping with CMPXCHG8B  first.
>
> Thanks,
> Pavel
>
> >
> >
> > Anyway, both implementations do not seem to be very hard, we could try
> > > both ways...
> > >
> > > --
> > > Egor Pasko
> > >
> > >
> >
> >
> > --
> > Weldon Washburn
> >
>


-- 
Mikhail Fursov

------=_Part_2620_13173052.1180687768435--