incubator-stdcxx-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Sebor <mse...@gmail.com>
Subject Re: Fwd: Re: STDCXX-1071 numpunct facet defect
Date Fri, 26 Oct 2012 22:50:26 GMT
On 10/26/2012 06:50 AM, Liviu Nicoara wrote:
> On 10/03/12 11:10, Martin Sebor wrote:
>> [...]
>> I was just thinking of a few simple loops along the lines of:
>>
>> void* thread_func (void*) {
>> for (int i = 0; i < N; ++)
>> test 1: do some simple stuff inline
>> test 2: call a virtual function to do the same stuff
>> test 3: lock and unlock a mutex and do the same stuff
>> }
>>
>> Test 1 should be the fastest and test 3 the slowest. This should
>> hold regardless of what "simple stuff" is (eventually, even when
>> it's getting numpunct::grouping() data).
>
> tl;dr: removing the facet data cache is a priority. All else can be put
> on the back-burner.
>
> Conflicting test results aside, there still is the case of the incorrect
> handling of the cached data in the facet. I don't think there is a
> disagreement on that. Considering that the std::string is moving in the
> direction of dropping the handle-body implementation, simply getting rid
> of the cache is a step in the same direction.
>
> I think that we should preserve the lock-free reading of the facet data,
> as a benign race, but making it benign is perhaps more complicated than
> previously suggested.
>
> As a reminder, the core of the facet access and initialization code
> essentially looks like this (pseudocode-ish):
>
>
> // facet data accessor
> ...
> if (0 == _C_impsize) { // 1
> mutex_lock ();
> if (_C_impsize)
> return _C_data;
> _C_data = get_facet_data (); // 2
> ?? // 3
> _C_impsize = 1; // 4
> mutex_unlock ();
> }
> ?? // 5
> return _C_data; // 6
> ...
>
>
> with question marks for missing, necessary fixes. The compiler needs to
> be prevented from re-ordering both 2-4 and 1-6. Just for the sake of
> argument I can imagine an optimization that reorders the reads in 1-6:
>
> register x = _C_data;
> if (_C_impsize)
> return x;
>
> and if the loads are executed in this order, the caller will see a stale
> _C_data.
>
> First, the 2-4 writes need to be executed in the program order. This
> needs both a compiler barrier and a store-store memory barrier that will
> keep the writes ordered.
>
> Then, the reads in 1-6 need to be ordered such that _C_data is read
> after _C_impsize, via a compiler barrier and a load-load memory barrier
> that will preserve the program order of the loads.
>
> Various compilers provide these features in various forms, but at the
> moment we don't have a unified STDCXX API to implement this.
>
> Of course, I might be wrong. Input is appreciated.

You're right (we don't have an API for it, and it is the double
checked locking problem mentioned earlier on in this thread).
Implementing it here would bloat the function (just in terms
of source code, not necessarily object code), and potentially
introduce a binary compatibility dependency on the new code
(i.e., we wouldn't be able to remove the fix in the future
w/o braking it).

I suggested moving the body of the outer if from the header
into a .cpp file in the library where we could implement
ugly, bloated locking without the risk of breaking things
if we removed/replaced it in the future. That's when we ran
into questions about how exactly to do this cleanly, etc.
It didn't seem to be doable very cleanly but I still think
it's a viable approach.

Martin

>
> Thanks,
> Liviu
>


Mime
View raw message