stdcxx-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Liviu Nicoara <>
Subject Re: Fwd: Re: STDCXX-1071 numpunct facet defect
Date Tue, 02 Oct 2012 12:22:31 GMT
On 09/30/12 18:18, Martin Sebor wrote:
> I see you did a 64-bit build while I did a 32-bit one. so
> I tried 64-bits. The cached version (i.e., the one compiled
> with -UNO_USE_NUMPUNCT_CACHE) is still about twice as fast
> as the non-cached one (compiled with -DNO_USE_NUMPUNCT_CACHE).
> I had made one change to the test program that I thought might
> account for the difference: I removed the call to abort from
> the thread function since it was causing the process to exit
> prematurely in some of my tests. But since you used the
> modified program for your latest measurements that couldn't
> be it.
> I can't explain the differences. They just don't make sense
> to me. Your results should be the other way around. Can you
> post the disassembly of function f() for each of the two
> configurations of the test?

The first thing that struck me in the cached `f' was that __string_ref 
class uses a mutex for synchronizing access to the ref counter. It turns 
out, for Linux on AMD64 we explicitly use a mutex instead of the atomic 
ops on the ref counter, via a block in rw/_config.h:

#  if _RWSTD_VER_MAJOR < 5
#    ifdef _RWSTD_OS_LINUX
        // on Linux/AMD64, unless explicitly requested, disable the use
        // of atomic operations in string for binary compatibility with
        // stdcxx 4.1.x
#      endif   // _RWSTD_USE_STRING_ATOMIC_OPS
#    endif   // _WIN32
#  endif   // stdcxx < 5.0

That is not the cause for the performance difference, though. Even after 
building with __RWSTD_USE_STRING_ATOMIC_OPS I get the same better 
performance with the non-cached version.


View raw message