From dev-return-8614-apmail-stdcxx-dev-archive=stdcxx.apache.org@stdcxx.apache.org Wed Sep 5 02:50:09 2012 Return-Path: X-Original-To: apmail-stdcxx-dev-archive@www.apache.org Delivered-To: apmail-stdcxx-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 712E9D1C5 for ; Wed, 5 Sep 2012 02:50:09 +0000 (UTC) Received: (qmail 29055 invoked by uid 500); 5 Sep 2012 02:50:09 -0000 Delivered-To: apmail-stdcxx-dev-archive@stdcxx.apache.org Received: (qmail 28991 invoked by uid 500); 5 Sep 2012 02:50:09 -0000 Mailing-List: contact dev-help@stdcxx.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@stdcxx.apache.org Delivered-To: mailing list dev@stdcxx.apache.org Received: (qmail 28983 invoked by uid 99); 5 Sep 2012 02:50:09 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Sep 2012 02:50:09 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of msebor@gmail.com designates 209.85.160.54 as permitted sender) Received: from [209.85.160.54] (HELO mail-pb0-f54.google.com) (209.85.160.54) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Sep 2012 02:50:01 +0000 Received: by pbbrp2 with SMTP id rp2so159212pbb.41 for ; Tue, 04 Sep 2012 19:49:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=j1DC96KZyQ7FukjwGbtsetlT0GSnClkZL+kZnBSl9pU=; b=vl5b/jp8RIBWIuUN1JXwRvAIiF4Wn62jGHI2fbfwsUJl7NT4VnQInceVGJF5RQByAf 6XaCDs+sagUQq5Rni7U2yuhtqEhD8tcua3ybUrlT4KcnsPYIVKdCb1qppCrP0ghh5bxw DvmRBy+vOxJd03nQssPDDQZ7nMjtbyrFdbLYZOymGwVxrlHy3qMsCSkIFoWJkj8uWblU JcFzGI2khVZ4zPT216HEsGT0WwT5ZMv4gaz/Q+jJZGp1qYPsPC1VXeKxGvLRfBMYmbEd fyoJYAOXimneDo/U1LtWrf5izja7jMOxTs4XDhBNDQtTbHlfvFsm4PY/hhqoJKRoX+5A oEHQ== Received: by 10.68.227.233 with SMTP id sd9mr50381351pbc.48.1346813379836; Tue, 04 Sep 2012 19:49:39 -0700 (PDT) Received: from localhost.localdomain (72-163-0-129.cisco.com. [72.163.0.129]) by mx.google.com with ESMTPS id gf3sm340919pbc.74.2012.09.04.19.49.38 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 04 Sep 2012 19:49:39 -0700 (PDT) Message-ID: <5046BDC1.3020400@gmail.com> Date: Tue, 04 Sep 2012 20:49:37 -0600 From: Martin Sebor User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120329 Thunderbird/11.0.1 MIME-Version: 1.0 To: dev@stdcxx.apache.org CC: Stefan Teleman Subject: Re: STDCXX-1056 [was: Re: STDCXX forks] References: <40394653-8FCC-4D04-A108-2C650AF8F95B@hates.ms> <5045E764.9090607@hates.ms> <595887D2-6E42-4BC4-AF69-085AE4BA8A7D@hates.ms> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit I recall discussing this when you opened the defect but I'm not sure what the outcome of the discussion was. I looked at the code some more just now and I agree with you that (at least the numpunct) facet isn't thread safe. The premise was that we didn't need any locking because the facet never changes after it's initialized, and the same data member can safely be assigned multiple times. The problem is that the string data members must be initialized before they can be assigned, and assignment to the same member must be guarded against concurrent accesses, there seems to be no mechanism that guarantees that this will be true when the same facet is used from within multiple threads. For example, the function below tries to avoid re-initialization of _C_grouping but the |= expression can be reordered either by the compiler or by the hardware to complete before the assignment to the member. The function also fails to protect the assignment to the member from taking place simultaneously in multiple threads. template inline string numpunct<_CharT>::grouping () const { if (!(_C_flags & _RW::__rw_gr)) { numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this); // [try to] get the grouping first (may throw) // then set a flag to avoid future initializations __self->_C_grouping = do_grouping (); __self->_C_flags |= _RW::__rw_gr; } return _C_grouping; } We need a fix but I don't think the one in the patch attached to the issue is a good solution. Locking all objects of the template specialization is far too coarse. Even having a lock per object would kill iostream performance. Creating a deep copy on return from each of the functions would also slow things down noticeably. The efficient solution must avoid locking in the common case (i.e., after the facet has been initialized). It can avoid locking around the POD data members altogether since those can safely be assigned multiple times (on common hardware), but it needs to ensure that the string data members are initialized before they are assigned to and are assigned at most once simultaneously. If we introduce a lock, it must be per object and not per type. With that, I would expect the following function to fix the problem in the one above. Let me know what you think. template inline string numpunct<_CharT>::grouping () const { if (!(_C_flags & _RW::__rw_gr)) { numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this); _RWSTD_MT_GUARD (_C_mutex); // [try to] get the grouping first (may throw) // then set a flag to avoid future initializations __self->_C_grouping = do_grouping (); __self->_C_flags |= _RW::__rw_gr; } return _C_grouping; } Martin On 09/04/2012 07:40 PM, Stefan Teleman wrote: > On Tue, Sep 4, 2012 at 9:15 PM, Liviu Nicoara wrote: > >> That is good, right? > > Yes, it is good. :-) > > >> Could not get an Intel build off the ground on the account of the LIMITS.cpp test not running to completion because of a possible compiler bug. I posted earlier an account of it. Do you have a support account that allows posting bug reports for Intel's C++ compiler? > > Oh, yes I remember that now, it happened to me too. I worked around it > by copying the LIMITS executable from a gcc build. > > I don't have a support contract with Intel - I'm using the free > compilers at home. I'm guessing Intel must have an open forum for > Intel developers using their compilers, so that might be a way to > report the bug. But I haven't looked for it. > > > >> I did not have access to a Solaris box since I left Rogue Wave. Does Sun/Oracle have any program that would allow one to test drive their compiler on a shared box somewhere? I vaguely remember that HP had something like that a while ago. > > I know Sun had such a program. Oracle I don't really know but my > educated guess is "not a chance". ;-) > > One way of testing with SunPro is to download the free version of > SunPro from Oracle, and use it on Linux (yes, SunPro exists for Linux > as well). It works very well on OpenSUSE and Ubuntu (tested and used > on both myself, debugged stdcxx on OpenSUSE with SunPro 12.2. Doesn't > work at all on Fedora 17 because of the TLS glibc bug (however it used > to on earlier Fedora releases). > > And now for the latest about 1056: > > I managed (after many, MANY tries) to get a run of > 22.locale.numpunct.mt on Solaris SPARC 32-bit build (at work), and > without the thread-safety patches applied, and I hit the jackpot: > > # INFO (S1) (10 lines): > # TEXT: > # COMPILER: SunPro, __SUNPRO_CC = 0x5100 > # ENVIRONMENT: sparc-v8 running sunos-5.11 > # FILE: 22.locale.numpunct.mt.cpp > # COMPILED: Sep 4 2012, 16:10:03 > # COMMENT: thread safety > ############################################################ > > # CLAUSE: lib.locale.numpunct > > # NOTE (S2) (5 lines): > # TEXT: executing "/usr/bin/locale -a> /var/tmp/tmpfile-MWa4bQ" > # CLAUSE: lib.locale.numpunct > # FILE: process.cpp > # LINE: 276 > > # INFO (S1) (3 lines): > # TEXT: testing std::numpunct with 4 threads, 1000 iterations > each, in 32 locales { "C" "POSIX" "af_ZA.UTF-8" "ar_AE.UTF-8" > "ar_BH.UTF-8" "ar_DZ.UTF-8" "ar_EG.ISO8859-6" "ar_EG.UTF-8" > "ar_IQ.UTF-8" "ar_JO.UTF-8" "ar_KW.UTF-8" "ar_LY.UTF-8" "ar_MA.UTF-8" > "ar_OM.UTF-8" "ar_QA.UTF-8" "ar_SA.UTF-8" "ar_TN.UTF-8" "ar_YE.UTF-8" > "as_IN.UTF-8" "az_AZ.UTF-8" "be_BY.UTF-8" "bg_BG.ISO8859-5" > "bg_BG.UTF-8" "bn_IN.UTF-8" "bs_BA.ISO8859-2" "bs_BA.UTF-8" > "ca_ES.ISO8859-1" "ca_ES.ISO8859-15" "ca_ES.UTF-8" "cs_CZ.ISO8859-2" > "cs_CZ.UTF-8" "cs_CZ.UTF-8@euro" } > # CLAUSE: lib.locale.numpunct > > [ ... snip ... ] > > # INFO (S1) (4 lines): > # TEXT: creating a thread pool with 4 threads > # CLAUSE: lib.locale.numpunct > # LINE: 548 > > Abort (core dump) > > /builds/steleman/userland-stdcxx-upstream-thread-safety/components/stdcxx/stdcxx-4.2.1/tests/localization/22.locale.numpunct.mt.cpp:140: > Assertion '0 == rw_strncmp (grp.c_str (), data.grouping_.c_str ())' > failed. > /builds/steleman/userland-stdcxx-upstream-thread-safety/components/stdcxx/stdcxx-4.2.1/build/lib/libstdcxx4.so.4.2.1'__1cE__rwQ__rw_assert_fail6Fpkc2i2_v_+0x78 > [0xff249af8] > /builds/steleman/userland-stdcxx-upstream-thread-safety/components/stdcxx/stdcxx-4.2.1/build/tests/22.locale.numpunct.mt'thread_func+0x500 > [0x1b050] > /lib/libc.so.1'_lwp_start+0x0 [0xf7b553f0] > > Bingo! > > At line 140 in 22.locale.numpunct.mt.cpp, we see: > > RW_ASSERT (0 == rw_strncmp (grp.c_str (), > data.grouping_.c_str ())); > > What this means: > > Between the time std::string grp was initialized at line 133: > > const std::string grp = np.grouping (); > > and the time we hit line 140 (the RW_ASSERT), another thread modified > the std::string _C_grouping member of class > > template class numpunct : public _RW::__rw_facet. > > And because this is the non-patched version of _numpunct.h, the > variable std::string grp did not really return a deep copy of > _C_grouping, but a shared handle to its underlying _CharT buffer. Nor > was the initialization of _C_grouping mutex-protected inside the > member function > > template > inline string numpunct<_CharT>::grouping () const. > > This shared buffer was subsequently modified by another thread, in > another locale, while this thread was convinced, all along, that it > was holding a deep copy of the variable std::string grp, in the locale > it wanted. Hence, the assertion on the strcmp(3C) mismatch. > > --Stefan >