incubator-stdcxx-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Liviu Nicoara <nikko...@hates.ms>
Subject Re: Fwd: Re: STDCXX-1071 numpunct facet defect
Date Fri, 26 Oct 2012 12:50:42 GMT
On 10/03/12 11:10, Martin Sebor wrote:
> [...]
> I was just thinking of a few simple loops along the lines of:
>
>    void* thread_func (void*) {
>        for (int i = 0; i < N; ++)
>            test 1: do some simple stuff inline
>            test 2: call a virtual function to do the same stuff
>            test 3: lock and unlock a mutex and do the same stuff
>    }
>
> Test 1 should be the fastest and test 3 the slowest. This should
> hold regardless of what "simple stuff" is (eventually, even when
> it's getting numpunct::grouping() data).

tl;dr: removing the facet data cache is a priority. All else can be put 
on the back-burner.

Conflicting test results aside, there still is the case of the incorrect 
handling of the cached data in the facet. I don't think there is a 
disagreement on that. Considering that the std::string is moving in the 
direction of dropping the handle-body implementation, simply getting rid 
of the cache is a step in the same direction.

I think that we should preserve the lock-free reading of the facet data, 
as a benign race, but making it benign is perhaps more complicated than 
previously suggested.

As a reminder, the core of the facet access and initialization code 
essentially looks like this (pseudocode-ish):


// facet data accessor
...
     if (0 == _C_impsize) {              // 1
         mutex_lock ();
         if (_C_impsize)
             return _C_data;
         _C_data    = get_facet_data (); // 2
         ??                              // 3
         _C_impsize = 1;                 // 4
         mutex_unlock ();
     }
     ??                                  // 5
     return _C_data;                     // 6
...


with question marks for missing, necessary fixes. The compiler needs to 
be prevented from re-ordering both 2-4 and 1-6. Just for the sake of 
argument I can imagine an optimization that reorders the reads in 1-6:

     register x = _C_data;
     if (_C_impsize)
         return x;

and if the loads are executed in this order, the caller will see a stale 
_C_data.

First, the 2-4 writes need to be executed in the program order. This 
needs both a compiler barrier and a store-store memory barrier that will 
keep the writes ordered.

Then, the reads in 1-6 need to be ordered such that _C_data is read 
after _C_impsize, via a compiler barrier and a load-load memory barrier 
that will preserve the program order of the loads.

Various compilers provide these features in various forms, but at the 
moment we don't have a unified STDCXX API to implement this.

Of course, I might be wrong. Input is appreciated.

Thanks,
Liviu


Mime
View raw message