stdcxx-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Sebor <>
Subject Re: Fwd: Re: STDCXX-1071 numpunct facet defect
Date Tue, 02 Oct 2012 14:41:07 GMT
I haven't had time to look at this since my last email on
Sunday. I also forgot about the string mutex. I don't think
I'll have time to spend on this until later in the week.
Unless the disassembly reveals the smoking gun, I think we
might need to simplify the test to get to the bottom of the
differences in our measurements. (I.e., eliminate the library
and measure the runtime of a simple thread loop, with and
without locking.) We should also look at the GLIBC and
kernel versions on our systems, on the off chance that
there has been a change that could explain the discrepancy
between my numbers and yours. I suspect my system (RHEL 4.8)
is much older than yours (I don't remember now if you posted
your details).


On 10/02/2012 06:22 AM, Liviu Nicoara wrote:
> On 09/30/12 18:18, Martin Sebor wrote:
>> I see you did a 64-bit build while I did a 32-bit one. so
>> I tried 64-bits. The cached version (i.e., the one compiled
>> with -UNO_USE_NUMPUNCT_CACHE) is still about twice as fast
>> as the non-cached one (compiled with -DNO_USE_NUMPUNCT_CACHE).
>> I had made one change to the test program that I thought might
>> account for the difference: I removed the call to abort from
>> the thread function since it was causing the process to exit
>> prematurely in some of my tests. But since you used the
>> modified program for your latest measurements that couldn't
>> be it.
>> I can't explain the differences. They just don't make sense
>> to me. Your results should be the other way around. Can you
>> post the disassembly of function f() for each of the two
>> configurations of the test?
> The first thing that struck me in the cached `f' was that __string_ref
> class uses a mutex for synchronizing access to the ref counter. It turns
> out, for Linux on AMD64 we explicitly use a mutex instead of the atomic
> ops on the ref counter, via a block in rw/_config.h:
> # if _RWSTD_VER_MAJOR < 5
> # ifdef _RWSTD_OS_LINUX
> // on Linux/AMD64, unless explicitly requested, disable the use
> // of atomic operations in string for binary compatibility with
> // stdcxx 4.1.x
> # endif // _WIN32
> # endif // stdcxx < 5.0
> That is not the cause for the performance difference, though. Even after
> building with __RWSTD_USE_STRING_ATOMIC_OPS I get the same better
> performance with the non-cached version.
> Liviu

View raw message