stdcxx-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Liviu Nicoara <nikko...@hates.ms>
Subject Re: STDCXX-1056 [was: Re: STDCXX forks]
Date Wed, 12 Sep 2012 02:18:54 GMT
On 9/11/12 9:40 PM, Martin Sebor wrote:
> On 09/11/2012 04:15 PM, Stefan Teleman wrote:
>> On Mon, Sep 10, 2012 at 4:24 PM, Stefan Teleman
>>> I think I have something which doesn't break BC - stay tuned because
>>> I'm testing it now.
>>
>> OK.
>>
>> So, here's a possible implementation of __rw_get_numpunct() with
>> minimal locking, which passes the MT tests and does not break ABI:
>>
>> s247136804.onlinehome.us/stdcxx-1056-20120911/punct.cpp
>>
>> And the same for include/loc/_numpunct.h:
>>
>> http://s247136804.onlinehome.us/stdcxx-1056-20120911/_numpunct.h
>>
>> In _numpunct.h, all the functions perform no checks and no lazy
>> initialization. They function simply as a pass-through to
>> __rw_get_numpunct(). std::numpunct<T>'s data members are now dead
>> varaiables.
>>
>> The bad: performance is no better than with locking the mutex inside
>> each of the std::numpunct<T>::*() functions, and with lazy
>> instantiation.
>
> I wouldn't expect this to be faster than the original. In fact,
> I would expect it to be slower because each call to one of the
> public, non-virtual members results in a call to the out-of-line
> virtual functions (and another to __rw_get_moneypunct). Avoiding
> the overhead of such calls is the main and only reason why the
> caching exists.
>

AFAICT, there are two cases to consider:

1. Using STDCXX locale database initializes the __rw_punct_t data in the first, 
properly synchronized pass through __rw_get_numpunct. All subsequent calls use 
the __rw_punct_t data to construct returned objects.
2. Using the C library locales does the same in the first pass, via setlocale 
and localeconv, but setlocale synchronization is via a per-process lock. The 
facet data, once initialized is used just like above.

I probably missed this in the previous conversation, but did you detect a race 
condition in the tests if the facets are simply forwarding to the private 
virtual interface? I.e., did you detect that the facet initialization code is 
unsafe? I think the facet __rw_punct_t data is safely initialized in both cases, 
it's the caching that is done incorrectly.

> I'm afraid unoptimized timings don't tell us much. Neither does
> a comparison between two compilers, even on the same OS.
>
> I looked at Liviu's timings today. I was puzzled by the difference
> between (1) which, IIUC, is the current implementation (presumably
> an optimized, thread-safe build with the same compiler and OS) and
> (4), which, again IIUC, is the equivalent of your latest patch here
> (again, presumably optimized, thread safe, same compiler/OS). I'm
> having trouble envisioning how calling a virtual function to
> retrieve the value of grouping can possibly be faster than not
> calling it (and simply returning the value cached in the data
> member of the facet.
>

The new results I attached to the issue come from a a bit clearer tests and they 
focus on just two cases: the current implementation vs. a non-caching one; the 
latter just forwards the grouping calls to the protected do_grouping, with _no_ 
other changes to the implementation.

The timing numbers seem to show that MT builds fare far worse with the caching 
than without. Stefan, if you have the time, could you please infirm :) my 
conclusions by timing it on one of your machines?

Thanks,

Liviu


Mime
View raw message