harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nathan Beyer" <ndbe...@apache.org>
Subject Re: [general] What platforms do we support?
Date Sat, 07 Apr 2007 20:53:22 GMT
Well, all of the mfence operations seem to be wrapped in helper
functions, so it should be a fairly targeted extraction that can
easily be tweaked as we go forward.

In the 'atomics.h' file, the helper functions on EM64/Win64 use the
intrinsic functions "_ReadWriteBarrier" and "_WriteBarrier"? Could we
just use those same functions on all platforms? They seem to be
available everywhere.

-Nathan

On 4/6/07, Rana Dasgupta <rdasgupt@gmail.com> wrote:
> Gregory,
>   First, the experiments are really useful and increase confidence
> more than any amount of discussion can. Thanks.
>   Here is my understanding of some processor basics, which is not a
> whole lot. The  x86 memory model is actually quite similar for P3, P4,
> Xeon processors for write back caches( most ) and non write combining
> memory ( most ).
>    Some things always hold true...writes are committed in program
> order( they are not done speculatively...so if a thread/processor does
> 3 updates in the program stream, they will be in order  except for
> streaming writes like in SSE2 instructions and some rare string
> operations which are unordered ), but reads  can be in any order.
> Reads can pass buffered writes, but it is almost certainly true that
> this will not happen on the same location. Reads/writes cannot pass
> instructions with a lock prefix, etc.
>    This is true of a single processor/thread, but for SMP's the
> guarantees are weaker. The above is true for each processor, but not
> for all the processors together. Writes from one processor can be
> unordered with respect to writes from another processor. This is OK
> because when we have a true contention between writes to the same
> memory location across threads, we always explicitly use critical
> sections and locks. We never rely on the processor ordering. Any VM
> code that does not do this is possibly wrong, and if we find it, we
> will need to change it.
>    The fence instructions ( sfence and mfence ) force all the pending
> and queued upstore and load/store instructions to finish before the
> next instruction( after the fence ) follows. They are not true lock
> instructions and are much cheaper...and they can only prevent the
> following instructions from being surprised by earlier instructions
> that have not yet been committed because of some complex
> cache/buffer/speculation behaviour. For example, they enforce volatile
> behaviour in the concurrent.atomics classes etc. On PIII, if we don't
> use the SSE type instructions, given the simpler cache and write
> buffer architecture on the older PIII machines, there is a good chance
> that  we will be OK. This is unlikely to be true on P4, HT and
> multicore systems.
>    So we should just try operating without them on the PIII only( not
> sfence, which exists on PIII, but lfence which is used for
> readwritebarriers), and if Nathan or we find concurrency related
> failures in some tests down the line, we will need to put locks in
> that part of the code. Locks are a really expensive way to do this
> type of serialization, but that's the only option.
>
> Thanks,
> Rana
>
>
>
> On 4/6/07, Gregory Shimansky <gshimansky@gmail.com> wrote:
> > On Friday 06 April 2007 02:39 Rana Dasgupta wrote:
> > > On 4/5/07, Gregory Shimansky <gshimansky@gmail.com> wrote:
> > > > On Thursday 05 April 2007 00:48 Rana Dasgupta wrote:
> > > > > On 4/4/07, Gregory Shimansky <gshimansky@gmail.com> wrote:
> > > > > > On Wednesday 04 April 2007 23:33 Rana Dasgupta wrote:
> > > > > > > On 4/4/07, Mikhail Fursov <mike.fursov@gmail.com>
wrote:
> > > > > > > > On 4/4/07, Alexey Petrenko <alexey.a.petrenko@gmail.com>
wrote:
> > > > > > > > > 2007/4/4, Gregory Shimansky <gshimansky@gmail.com>:
> > > > > > > > > > > > I would like to see these modifications.
I wonder what
> > > > > > > > > > > > you've done in
> > > > > > > > > >
> > > > > > > > > > port/src/thread/linux/apr_thread_ext.c and
> > > > > > > > > > vmcore/include/atomics.h. They contain mfence
and sfence
> > > > > > > > > > instructions in inline assembly which have
to be changed to
> > > > > > > > > > something else on P3.
> > > > > > >
> > > > > > > MemoryWriteBarrier() etc. should be no-ops on PIII. x86
is already
> > > > > > > strongly ordered for writes ?
> > > > > >
> > > > > > What about MemoryReadWriteBarrier()? If you know, what kind
of code
> > > > > > should be used for this in P3?
> > > > >
> > > > > One of the compiler guys can confirm this. But I don't believe that
> > > > > you need to worry about any of the fence instructions fence on any
of
> > > > > the PIII, PIV genuine intel procs unless you are using streaming
mode
> > > > > ( SIMD ) instructions which are weakly ordered.
> > > >
> > > > I actually grepped the use for MemoryReadWriteBarrier, MemoryWriteBarrier
> > > > and apr_memory_rw_barrier functions which are wrappers to mfence/sfence
> > > > instructions. They aren't used in the code which uses SSE2 in any way.
> > > >
> > > > - The apr_memory_rw_barrier (executes mfence) function is used in thin
> > > > locks implementation in threading code.
> > > >
> > > > - MemoryReadWriteBarrier (executes mfence) is used in
> > > > org.apache.harmony.util.concurrent natives implementation after
> > > > writing/reading int/long/object fields via JNI.
> > > >
> > > > - MemoryWriteBarrier (executes sfence) is used in classloader for fast
> > > > management of classes collection and in strings pool for the same reason.
> > > >
> > > > In all three cases SSE2 is not involved in any way, simply loads and
> > > > stores are done with the memory. According to you in all of those cases
> > > > memory barriers are not needed. I am just confused then why were they
> > > > inserted in those places?
> > >
> > > I don't know the answer to this question ...unless it was intended to
> > > cover clones etc. that don't fully support the writeback model...
> >
> > I should have put the question in a different way. I didn't actually mean that
> > you should know why some code is written in VM. I don't know why some code is
> > written in many places including those I mentioned.
> >
> > The question should actually be like, should we actually remove mfence and
> > sfence assembly instructions from the VM sources for x86/x86_64 platforms? I
> > commented mfence in port/src/thread/linux/apr_thread_ext.c and mfence/sfence
> > in vmcore/include/atomics.h and ran VM tests on 5 different SMP boxes with no
> > less than 4 logical CPUs on each of them (2 win32, linux32, windows64 and
> > linux64). Tests seem to work just fine without mfence and sfence in VM code.
> >
> > With these instructions removed from the code there shall be no problem with
> > P3 port on VM side. It seems they are actually unnecessary and were inserted
> > for a reason that they help on SMP to synchronize caches. After your
> > explanation that they are actually needed only when SSE2 is involved, it
> > seems (and my tests show this) that they are just not needed.
> >
> > --
> > Gregory
> >
>

Mime
View raw message