stdcxx-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Sebor <mse...@gmail.com>
Subject Re: STDCXX-1056 [was: Re: STDCXX forks]
Date Wed, 05 Sep 2012 02:49:37 GMT
I recall discussing this when you opened the defect but I'm not
sure what the outcome of the discussion was.

I looked at the code some more just now and I agree with you that
(at least the numpunct) facet isn't thread safe. The premise was
that we didn't need any locking because the facet never changes
after it's initialized, and the same data member can safely be
assigned multiple times.

The problem is that the string data members must be initialized
before they can be assigned, and assignment to the same member
must be guarded against concurrent accesses, there seems to be
no mechanism that guarantees that this will be true when the
same facet is used from within multiple threads.

For example, the function below tries to avoid re-initialization
of _C_grouping but the |= expression can be reordered either by
the compiler or by the hardware to complete before the assignment
to the member. The function also fails to protect the assignment
to the member from taking place simultaneously in multiple threads.

   template <class _CharT>
   inline string numpunct<_CharT>::grouping () const
   {
       if (!(_C_flags & _RW::__rw_gr)) {

           numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this);

           // [try to] get the grouping first (may throw)
           // then set a flag to avoid future initializations
           __self->_C_grouping  = do_grouping ();
           __self->_C_flags    |= _RW::__rw_gr;
       }

       return _C_grouping;
   }

We need a fix but I don't think the one in the patch attached
to the issue is a good solution. Locking all objects of the
template specialization is far too coarse. Even having a lock
per object would kill iostream performance. Creating a deep
copy on return from each of the functions would also slow
things down noticeably.

The efficient solution must avoid locking in the common case
(i.e., after the facet has been initialized). It can avoid
locking around the POD data members altogether since those can
safely be assigned multiple times (on common hardware), but it
needs to ensure that the string data members are initialized
before they are assigned to and are assigned at most once
simultaneously. If we introduce a lock, it must be per object
and not per type.

With that, I would expect the following function to fix the
problem in the one above. Let me know what you think.

   template <class _CharT>
   inline string numpunct<_CharT>::grouping () const
   {
       if (!(_C_flags & _RW::__rw_gr)) {

           numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this);

          _RWSTD_MT_GUARD (_C_mutex);

          // [try to] get the grouping first (may throw)
           // then set a flag to avoid future initializations
           __self->_C_grouping  = do_grouping ();
           __self->_C_flags    |= _RW::__rw_gr;
       }

       return _C_grouping;
   }

Martin

On 09/04/2012 07:40 PM, Stefan Teleman wrote:
> On Tue, Sep 4, 2012 at 9:15 PM, Liviu Nicoara<nikkoara@hates.ms>  wrote:
>
>> That is good, right?
>
> Yes, it is good. :-)
>
>
>> Could not get an Intel build off the ground on the account of the LIMITS.cpp test
not running to completion because of a possible compiler bug. I posted earlier an account
of it. Do you have a support account that allows posting bug reports for Intel's C++ compiler?
>
> Oh, yes I remember that now, it happened to me too. I worked around it
> by copying the LIMITS executable from a gcc build.
>
> I don't have a support contract with Intel - I'm using the free
> compilers at home. I'm guessing Intel must have an open forum for
> Intel developers using their compilers, so that might be a way to
> report the bug. But I haven't looked for it.
>
>
>
>> I did not have access to a Solaris box since I left Rogue Wave. Does Sun/Oracle have
any program that would allow one to test drive their compiler on a shared box somewhere? I
vaguely remember that HP had something like that a while ago.
>
> I know Sun had such a program. Oracle I don't really know but my
> educated guess is "not a chance". ;-)
>
> One way of testing with SunPro is to download the free version of
> SunPro from Oracle, and use it on Linux (yes, SunPro exists for Linux
> as well). It works very well on OpenSUSE and Ubuntu (tested and used
> on both myself, debugged stdcxx on OpenSUSE with SunPro 12.2. Doesn't
> work at all on Fedora 17 because of the TLS glibc bug (however it used
> to on earlier Fedora releases).
>
> And now for the latest about 1056:
>
> I managed (after many, MANY tries) to get a run of
> 22.locale.numpunct.mt on Solaris SPARC 32-bit build (at work), and
> without the thread-safety patches applied, and I hit the jackpot:
>
> # INFO (S1) (10 lines):
> # TEXT:
> # COMPILER: SunPro, __SUNPRO_CC = 0x5100
> # ENVIRONMENT: sparc-v8 running sunos-5.11
> # FILE: 22.locale.numpunct.mt.cpp
> # COMPILED: Sep  4 2012, 16:10:03
> # COMMENT: thread safety
> ############################################################
>
> # CLAUSE: lib.locale.numpunct
>
> # NOTE (S2) (5 lines):
> # TEXT: executing "/usr/bin/locale -a>  /var/tmp/tmpfile-MWa4bQ"
> # CLAUSE: lib.locale.numpunct
> # FILE: process.cpp
> # LINE: 276
>
> # INFO (S1) (3 lines):
> # TEXT: testing std::numpunct<charT>  with 4 threads, 1000 iterations
> each, in 32 locales { "C" "POSIX" "af_ZA.UTF-8" "ar_AE.UTF-8"
> "ar_BH.UTF-8" "ar_DZ.UTF-8" "ar_EG.ISO8859-6" "ar_EG.UTF-8"
> "ar_IQ.UTF-8" "ar_JO.UTF-8" "ar_KW.UTF-8" "ar_LY.UTF-8" "ar_MA.UTF-8"
> "ar_OM.UTF-8" "ar_QA.UTF-8" "ar_SA.UTF-8" "ar_TN.UTF-8" "ar_YE.UTF-8"
> "as_IN.UTF-8" "az_AZ.UTF-8" "be_BY.UTF-8" "bg_BG.ISO8859-5"
> "bg_BG.UTF-8" "bn_IN.UTF-8" "bs_BA.ISO8859-2" "bs_BA.UTF-8"
> "ca_ES.ISO8859-1" "ca_ES.ISO8859-15" "ca_ES.UTF-8" "cs_CZ.ISO8859-2"
> "cs_CZ.UTF-8" "cs_CZ.UTF-8@euro" }
> # CLAUSE: lib.locale.numpunct
>
> [ ... snip ... ]
>
> # INFO (S1) (4 lines):
> # TEXT: creating a thread pool with 4 threads
> # CLAUSE: lib.locale.numpunct
> # LINE: 548
>
> Abort (core dump)
>
> /builds/steleman/userland-stdcxx-upstream-thread-safety/components/stdcxx/stdcxx-4.2.1/tests/localization/22.locale.numpunct.mt.cpp:140:
> Assertion '0 == rw_strncmp (grp.c_str (), data.grouping_.c_str ())'
> failed.
> /builds/steleman/userland-stdcxx-upstream-thread-safety/components/stdcxx/stdcxx-4.2.1/build/lib/libstdcxx4.so.4.2.1'__1cE__rwQ__rw_assert_fail6Fpkc2i2_v_+0x78
> [0xff249af8]
> /builds/steleman/userland-stdcxx-upstream-thread-safety/components/stdcxx/stdcxx-4.2.1/build/tests/22.locale.numpunct.mt'thread_func+0x500
> [0x1b050]
> /lib/libc.so.1'_lwp_start+0x0 [0xf7b553f0]
>
> Bingo!
>
> At line 140 in 22.locale.numpunct.mt.cpp, we see:
>
>   RW_ASSERT (0 == rw_strncmp (grp.c_str (),
>                                          data.grouping_.c_str ()));
>
> What this means:
>
> Between the time std::string grp was initialized at line 133:
>
> const std::string grp = np.grouping ();
>
> and the time we hit line 140 (the RW_ASSERT), another thread modified
> the std::string _C_grouping member of class
>
> template<class _CharT>  class numpunct : public _RW::__rw_facet.
>
> And because this is the non-patched version of _numpunct.h, the
> variable std::string grp did not really return a deep copy of
> _C_grouping, but a shared handle to its underlying _CharT buffer. Nor
> was the initialization of _C_grouping mutex-protected inside the
> member function
>
> template<class _CharT>
> inline string numpunct<_CharT>::grouping () const.
>
> This shared buffer was subsequently modified by another thread, in
> another locale, while this thread was convinced, all along, that it
> was holding a deep copy of the variable std::string grp, in the locale
> it wanted. Hence, the assertion on the strcmp(3C) mismatch.
>
> --Stefan
>


Mime
View raw message