harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nathan Beyer" <ndbe...@apache.org>
Subject Re: [general] What platforms do we support?
Date Sun, 08 Apr 2007 20:10:03 GMT
I found this little article about using memory barriers in Windows [1]
(and XBox?). There's a mechanism for using these barriers in MSVC 2003
(7.1).

As for Linux, I think we should use the built-in memory barrier
methods [2] in the kernel. Specifically, the smp_mb() (read/write) and
smp_wmb() (write).

I believe this would give the appropriate level of abstraction for the
static code we need to run on the various architectures and OSes and
still get the best execution.

I'm working on testing this theory out right now.

-Nathan

[1] http://msdn2.microsoft.com/en-us/library/bb310595.aspx
[2] http://lxr.linux.no/source/Documentation/memory-barriers.txt

On 4/8/07, Rana Dasgupta <rdasgupt@gmail.com> wrote:
> They are not completely no-ops. They are directives to the compiler
> not to reorder during its optimizations, though they don't generate
> instructions. If available, I think it is a better idea to use them
> instead of removing the fences. I agree, they at least allow us to
> tweak them in a platform specific way if we need them. I am not sure
> if they are both vailable on the older tools eg VS7.1 ...they are in
> VS8.
>
> While the tests are great,  there is no guarantee that they are
> exercising all situations where we may need the fences. So, some
> caution is a good idea.
>
> On 4/8/07, Gregory Shimansky <gshimansky@gmail.com> wrote:
> > On Sunday 08 April 2007 06:36 Xiao-Feng Li wrote:
> > > On 4/8/07, Nathan Beyer <ndbeyer@apache.org> wrote:
> > > > Well, all of the mfence operations seem to be wrapped in helper
> > > > functions, so it should be a fairly targeted extraction that can
> > > > easily be tweaked as we go forward.
> > > >
> > > > In the 'atomics.h' file, the helper functions on EM64/Win64 use the
> > > > intrinsic functions "_ReadWriteBarrier" and "_WriteBarrier"? Could we
> > > > just use those same functions on all platforms? They seem to be
> > > > available everywhere.
> > >
> > > Good idea. Those fence instructions have their respetive usage but
> > > could be useless for an architecture with a stronger memory model. To
> > > put them into macro or intrinsic would be a better approach than to
> > > simply remove them. (In case of the architectures that use different
> > > atomic mechanism like LL/CS vs. CAS, we may have to rewrite some code
> > > sequence, but that's another issue.)
> >
> > My recent discovery after reading MSDN is that _[Read]WriteBarrier intrinsics
> > which were introduced in MSVC 2005 are a NOOP in the code. These intrinsics
> > are only directives to the compiler optimizer and are somewhat similar to the
> > volatile keyword in the code, but less powerful and intrusive. When reaching
> > _[Read]WriteBarrier intrinsic, compiler makes sure that all necessary reads
> > and writes are made to the memory, so the variables are not cached on the
> > registers in the place where intrinsic is put. No code is generated for them.
> >
> > To actually insert mfence into the code MSVC has another intrinsic
> > _mm_mfence() and it will generate a true mfence in the code no matter which
> > architecture is chosen for the compilation.
> >
> > Answering Nathan's question about my experiment, yes I commented out
> > [Read]WriteBarrier implementation for 64-bit systems when I done it,
> > including the implementation for windows64 which used those NOOP intrinsics
> > _[Read]WriteBarrier.
> >
> > This is actually another sign that mfence and sfence are probably not needed
> > in atomics.h since all the way since windows64 port was enabled, it used
> > these NOOP compiler directives for [Read]WriteBarrier and there were no race
> > conditions noticed compared to Linux64.
> >
> > > > -Nathan
> > > >
> > > > On 4/6/07, Rana Dasgupta <rdasgupt@gmail.com> wrote:
> > > > > Gregory,
> > > > >   First, the experiments are really useful and increase confidence
> > > > > more than any amount of discussion can. Thanks.
> > > > >   Here is my understanding of some processor basics, which is not
a
> > > > > whole lot. The  x86 memory model is actually quite similar for P3,
P4,
> > > > > Xeon processors for write back caches( most ) and non write combining
> > > > > memory ( most ).
> > > > >    Some things always hold true...writes are committed in program
> > > > > order( they are not done speculatively...so if a thread/processor
does
> > > > > 3 updates in the program stream, they will be in order  except for
> > > > > streaming writes like in SSE2 instructions and some rare string
> > > > > operations which are unordered ), but reads  can be in any order.
> > > > > Reads can pass buffered writes, but it is almost certainly true that
> > > > > this will not happen on the same location. Reads/writes cannot pass
> > > > > instructions with a lock prefix, etc.
> > > > >    This is true of a single processor/thread, but for SMP's the
> > > > > guarantees are weaker. The above is true for each processor, but
not
> > > > > for all the processors together. Writes from one processor can be
> > > > > unordered with respect to writes from another processor. This is
OK
> > > > > because when we have a true contention between writes to the same
> > > > > memory location across threads, we always explicitly use critical
> > > > > sections and locks. We never rely on the processor ordering. Any
VM
> > > > > code that does not do this is possibly wrong, and if we find it,
we
> > > > > will need to change it.
> > > > >    The fence instructions ( sfence and mfence ) force all the pending
> > > > > and queued upstore and load/store instructions to finish before the
> > > > > next instruction( after the fence ) follows. They are not true lock
> > > > > instructions and are much cheaper...and they can only prevent the
> > > > > following instructions from being surprised by earlier instructions
> > > > > that have not yet been committed because of some complex
> > > > > cache/buffer/speculation behaviour. For example, they enforce volatile
> > > > > behaviour in the concurrent.atomics classes etc. On PIII, if we don't
> > > > > use the SSE type instructions, given the simpler cache and write
> > > > > buffer architecture on the older PIII machines, there is a good chance
> > > > > that  we will be OK. This is unlikely to be true on P4, HT and
> > > > > multicore systems.
> > > > >    So we should just try operating without them on the PIII only(
not
> > > > > sfence, which exists on PIII, but lfence which is used for
> > > > > readwritebarriers), and if Nathan or we find concurrency related
> > > > > failures in some tests down the line, we will need to put locks in
> > > > > that part of the code. Locks are a really expensive way to do this
> > > > > type of serialization, but that's the only option.
> > > > >
> > > > > Thanks,
> > > > > Rana
> > > > >
> > > > > On 4/6/07, Gregory Shimansky <gshimansky@gmail.com> wrote:
> > > > > > On Friday 06 April 2007 02:39 Rana Dasgupta wrote:
> > > > > > > On 4/5/07, Gregory Shimansky <gshimansky@gmail.com>
wrote:
> > > > > > > > On Thursday 05 April 2007 00:48 Rana Dasgupta wrote:
> > > > > > > > > On 4/4/07, Gregory Shimansky <gshimansky@gmail.com>
wrote:
> > > > > > > > > > On Wednesday 04 April 2007 23:33 Rana Dasgupta
wrote:
> > > > > > > > > > > On 4/4/07, Mikhail Fursov <mike.fursov@gmail.com>
wrote:
> > > > > > > > > > > > On 4/4/07, Alexey Petrenko <alexey.a.petrenko@gmail.com>
> > wrote:
> > > > > > > > > > > > > 2007/4/4, Gregory Shimansky
<gshimansky@gmail.com>:
> > > > > > > > > > > > > > > > I would like
to see these modifications. I wonder
> > > > > > > > > > > > > > > > what you've
done in
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > port/src/thread/linux/apr_thread_ext.c
and
> > > > > > > > > > > > > > vmcore/include/atomics.h.
They contain mfence and
> > > > > > > > > > > > > > sfence instructions
in inline assembly which have to
> > > > > > > > > > > > > > be changed to something
else on P3.
> > > > > > > > > > >
> > > > > > > > > > > MemoryWriteBarrier() etc. should be
no-ops on PIII. x86 is
> > > > > > > > > > > already strongly ordered for writes
?
> > > > > > > > > >
> > > > > > > > > > What about MemoryReadWriteBarrier()? If
you know, what kind
> > > > > > > > > > of code should be used for this in P3?
> > > > > > > > >
> > > > > > > > > One of the compiler guys can confirm this. But
I don't believe
> > > > > > > > > that you need to worry about any of the fence
instructions
> > > > > > > > > fence on any of the PIII, PIV genuine intel procs
unless you
> > > > > > > > > are using streaming mode ( SIMD ) instructions
which are weakly
> > > > > > > > > ordered.
> > > > > > > >
> > > > > > > > I actually grepped the use for MemoryReadWriteBarrier,
> > > > > > > > MemoryWriteBarrier and apr_memory_rw_barrier functions
which are
> > > > > > > > wrappers to mfence/sfence instructions. They aren't
used in the
> > > > > > > > code which uses SSE2 in any way.
> > > > > > > >
> > > > > > > > - The apr_memory_rw_barrier (executes mfence) function
is used in
> > > > > > > > thin locks implementation in threading code.
> > > > > > > >
> > > > > > > > - MemoryReadWriteBarrier (executes mfence) is used
in
> > > > > > > > org.apache.harmony.util.concurrent natives implementation
after
> > > > > > > > writing/reading int/long/object fields via JNI.
> > > > > > > >
> > > > > > > > - MemoryWriteBarrier (executes sfence) is used in
classloader for
> > > > > > > > fast management of classes collection and in strings
pool for the
> > > > > > > > same reason.
> > > > > > > >
> > > > > > > > In all three cases SSE2 is not involved in any way,
simply loads
> > > > > > > > and stores are done with the memory. According to
you in all of
> > > > > > > > those cases memory barriers are not needed. I am just
confused
> > > > > > > > then why were they inserted in those places?
> > > > > > >
> > > > > > > I don't know the answer to this question ...unless it was
intended
> > > > > > > to cover clones etc. that don't fully support the writeback
> > > > > > > model...
> > > > > >
> > > > > > I should have put the question in a different way. I didn't
actually
> > > > > > mean that you should know why some code is written in VM. I
don't
> > > > > > know why some code is written in many places including those
I
> > > > > > mentioned.
> > > > > >
> > > > > > The question should actually be like, should we actually remove
> > > > > > mfence and sfence assembly instructions from the VM sources
for
> > > > > > x86/x86_64 platforms? I commented mfence in
> > > > > > port/src/thread/linux/apr_thread_ext.c and mfence/sfence in
> > > > > > vmcore/include/atomics.h and ran VM tests on 5 different SMP
boxes
> > > > > > with no less than 4 logical CPUs on each of them (2 win32, linux32,
> > > > > > windows64 and linux64). Tests seem to work just fine without
mfence
> > > > > > and sfence in VM code.
> > > > > >
> > > > > > With these instructions removed from the code there shall be
no
> > > > > > problem with P3 port on VM side. It seems they are actually
> > > > > > unnecessary and were inserted for a reason that they help on
SMP to
> > > > > > synchronize caches. After your explanation that they are actually
> > > > > > needed only when SSE2 is involved, it seems (and my tests show
this)
> > > > > > that they are just not needed.
> > > > > >
> > > > > > --
> > > > > > Gregory
> >
> > --
> > Gregory
> >
>

Mime
View raw message