harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rana Dasgupta" <rdasg...@gmail.com>
Subject Re: [general] What platforms do we support?
Date Fri, 06 Apr 2007 23:38:36 GMT
Gregory,
  First, the experiments are really useful and increase confidence
more than any amount of discussion can. Thanks.
  Here is my understanding of some processor basics, which is not a
whole lot. The  x86 memory model is actually quite similar for P3, P4,
Xeon processors for write back caches( most ) and non write combining
memory ( most ).
   Some things always hold true...writes are committed in program
order( they are not done speculatively...so if a thread/processor does
3 updates in the program stream, they will be in order  except for
streaming writes like in SSE2 instructions and some rare string
operations which are unordered ), but reads  can be in any order.
Reads can pass buffered writes, but it is almost certainly true that
this will not happen on the same location. Reads/writes cannot pass
instructions with a lock prefix, etc.
   This is true of a single processor/thread, but for SMP's the
guarantees are weaker. The above is true for each processor, but not
for all the processors together. Writes from one processor can be
unordered with respect to writes from another processor. This is OK
because when we have a true contention between writes to the same
memory location across threads, we always explicitly use critical
sections and locks. We never rely on the processor ordering. Any VM
code that does not do this is possibly wrong, and if we find it, we
will need to change it.
   The fence instructions ( sfence and mfence ) force all the pending
and queued upstore and load/store instructions to finish before the
next instruction( after the fence ) follows. They are not true lock
instructions and are much cheaper...and they can only prevent the
following instructions from being surprised by earlier instructions
that have not yet been committed because of some complex
cache/buffer/speculation behaviour. For example, they enforce volatile
behaviour in the concurrent.atomics classes etc. On PIII, if we don't
use the SSE type instructions, given the simpler cache and write
buffer architecture on the older PIII machines, there is a good chance
that  we will be OK. This is unlikely to be true on P4, HT and
multicore systems.
   So we should just try operating without them on the PIII only( not
sfence, which exists on PIII, but lfence which is used for
readwritebarriers), and if Nathan or we find concurrency related
failures in some tests down the line, we will need to put locks in
that part of the code. Locks are a really expensive way to do this
type of serialization, but that's the only option.

Thanks,
Rana



On 4/6/07, Gregory Shimansky <gshimansky@gmail.com> wrote:
> On Friday 06 April 2007 02:39 Rana Dasgupta wrote:
> > On 4/5/07, Gregory Shimansky <gshimansky@gmail.com> wrote:
> > > On Thursday 05 April 2007 00:48 Rana Dasgupta wrote:
> > > > On 4/4/07, Gregory Shimansky <gshimansky@gmail.com> wrote:
> > > > > On Wednesday 04 April 2007 23:33 Rana Dasgupta wrote:
> > > > > > On 4/4/07, Mikhail Fursov <mike.fursov@gmail.com> wrote:
> > > > > > > On 4/4/07, Alexey Petrenko <alexey.a.petrenko@gmail.com>
wrote:
> > > > > > > > 2007/4/4, Gregory Shimansky <gshimansky@gmail.com>:
> > > > > > > > > > > I would like to see these modifications.
I wonder what
> > > > > > > > > > > you've done in
> > > > > > > > >
> > > > > > > > > port/src/thread/linux/apr_thread_ext.c and
> > > > > > > > > vmcore/include/atomics.h. They contain mfence
and sfence
> > > > > > > > > instructions in inline assembly which have to
be changed to
> > > > > > > > > something else on P3.
> > > > > >
> > > > > > MemoryWriteBarrier() etc. should be no-ops on PIII. x86 is already
> > > > > > strongly ordered for writes ?
> > > > >
> > > > > What about MemoryReadWriteBarrier()? If you know, what kind of code
> > > > > should be used for this in P3?
> > > >
> > > > One of the compiler guys can confirm this. But I don't believe that
> > > > you need to worry about any of the fence instructions fence on any of
> > > > the PIII, PIV genuine intel procs unless you are using streaming mode
> > > > ( SIMD ) instructions which are weakly ordered.
> > >
> > > I actually grepped the use for MemoryReadWriteBarrier, MemoryWriteBarrier
> > > and apr_memory_rw_barrier functions which are wrappers to mfence/sfence
> > > instructions. They aren't used in the code which uses SSE2 in any way.
> > >
> > > - The apr_memory_rw_barrier (executes mfence) function is used in thin
> > > locks implementation in threading code.
> > >
> > > - MemoryReadWriteBarrier (executes mfence) is used in
> > > org.apache.harmony.util.concurrent natives implementation after
> > > writing/reading int/long/object fields via JNI.
> > >
> > > - MemoryWriteBarrier (executes sfence) is used in classloader for fast
> > > management of classes collection and in strings pool for the same reason.
> > >
> > > In all three cases SSE2 is not involved in any way, simply loads and
> > > stores are done with the memory. According to you in all of those cases
> > > memory barriers are not needed. I am just confused then why were they
> > > inserted in those places?
> >
> > I don't know the answer to this question ...unless it was intended to
> > cover clones etc. that don't fully support the writeback model...
>
> I should have put the question in a different way. I didn't actually mean that
> you should know why some code is written in VM. I don't know why some code is
> written in many places including those I mentioned.
>
> The question should actually be like, should we actually remove mfence and
> sfence assembly instructions from the VM sources for x86/x86_64 platforms? I
> commented mfence in port/src/thread/linux/apr_thread_ext.c and mfence/sfence
> in vmcore/include/atomics.h and ran VM tests on 5 different SMP boxes with no
> less than 4 logical CPUs on each of them (2 win32, linux32, windows64 and
> linux64). Tests seem to work just fine without mfence and sfence in VM code.
>
> With these instructions removed from the code there shall be no problem with
> P3 port on VM side. It seems they are actually unnecessary and were inserted
> for a reason that they help on SMP to synchronize caches. After your
> explanation that they are actually needed only when SSE2 is involved, it
> seems (and my tests show this) that they are just not needed.
>
> --
> Gregory
>

Mime
View raw message