Return-Path: X-Original-To: apmail-stdcxx-dev-archive@www.apache.org Delivered-To: apmail-stdcxx-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A2E69D3AF for ; Sun, 16 Sep 2012 23:45:00 +0000 (UTC) Received: (qmail 34009 invoked by uid 500); 16 Sep 2012 23:45:00 -0000 Delivered-To: apmail-stdcxx-dev-archive@stdcxx.apache.org Received: (qmail 33932 invoked by uid 500); 16 Sep 2012 23:45:00 -0000 Mailing-List: contact dev-help@stdcxx.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@stdcxx.apache.org Delivered-To: mailing list dev@stdcxx.apache.org Received: (qmail 33924 invoked by uid 99); 16 Sep 2012 23:45:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 16 Sep 2012 23:45:00 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [64.34.174.152] (HELO hates.ms) (64.34.174.152) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 16 Sep 2012 23:44:55 +0000 Received: from [2.128.155.12] (unknown [206.197.31.227]) by hates.ms (Postfix) with ESMTPSA id F29BC45C1B2 for ; Sun, 16 Sep 2012 23:44:33 +0000 (UTC) Message-ID: <50566461.4080101@hates.ms> Date: Sun, 16 Sep 2012 19:44:33 -0400 From: Liviu Nicoara User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:15.0) Gecko/20120907 Thunderbird/15.0.1 MIME-Version: 1.0 To: dev@stdcxx.apache.org Subject: Re: STDCXX-1056 [was: Re: STDCXX forks] References: <40394653-8FCC-4D04-A108-2C650AF8F95B@hates.ms> <5049016D.8000902@gmail.com> <504E2440.1060706@gmail.com> <504FE7F9.90102@gmail.com> <504FF10E.8020506@hates.ms> <50508849.4070708@hates.ms> <50547C22.1000604@hates.ms> <5054EAB7.5050609@hates.ms> <5055EE63.6090004@hates.ms> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org On 9/16/12 3:20 AM, Stefan Teleman wrote: > On Sat, Sep 15, 2012 at 4:53 PM, Liviu Nicoara wrote: > >> Now, to clear the confusion I created: the timing numbers I posted in the >> attachment stdcxx-1056-timings.tgz to STDCXX-1066 (09/11/2012) showed that a >> perfectly forwarding, no caching public interface (exemplified by a changed >> grouping) performs better than the current implementation. It was that test >> case that I hoped you could time, perhaps on SPARC, in both MT and ST >> builds. The t.cpp program is for MT, s.cpp for ST. > > I got your patch, and have tested it. > > I have created two Experiments (that's what they are called) with the > SunPro Performance Analyzer. Both experiments are targeting race > conditions and deadlocks in the instrumented program, and both > experiments are running the 22.locale.numpunct.mt program from the > stdcxx test harness. One experiment is with your patch applied. The > other experiment is with our (Solaris) patch applied. > > Here are the results: I looked at the analysis more closely. > > 1. with your patch applied: > > http://s247136804.onlinehome.us/22.locale.numpunct.mt.1.er.nts/ I see here (http://tinyurl.com/94pbmzc) that the implementation of the facet public interface is forwarding, with no caching. > > 2. with our (Solaris) patch applied: > > http://s247136804.onlinehome.us/22.locale.numpunct.mt.1.er.ts/ Unfortunately, can't do the same here. Could you please refresh my memory what does the patch contain? This patch is not part of the patch set you published here earlier (http://tinyurl.com/8pyql4g)? AFAICT, the race accesses that the analyzer points out are writes to shared locations which occur along the thread execution path. They do not necessarily mean that a race condition exists, and in fact we know that no race condition exists if the public facet interface forwards to the protected virtual interface. Which is what was tested in the first analysis, looking at _numpunct.h: http://tinyurl.com/94pbmzc Looking elsewhere, also in the first analysis, the __rw_get_numpunct function (src link points here: http://tinyurl.com/8ez85e2). All highlighted lines, each performing a write to shared locations, are potential race points, but do not lead to race conditions because of the proper synchronization we know occurs in the __rw_setlocale class. The number of race accesses in __rw_get_numpunct sums up to ~3400 race accesses, with a forwarding patch. That you pointed out in a later email. That number was a bit puzzling, but then looking at the thread function I see the test uses the numpunct test suite code, which creates a locale and extracts the facet from it in each iteration. That means that, ideally, for 4 threads iterating 10000 times, I would expect locales being created 40K times, and so for the facets and so for the __rw_get_numpunct calls, etc. The number or race accesses collected, far less than that, could be explained by a lesser degree of thread overlapping? I.e., some threads start earlier, others later, and only partially overlap? If that is the case I would not ascribe much importance to these numbers. As I think was pointed out earlier, a numpunct facet is initialized at the first trip in __rw_get_numpunct and that trip is (only then) properly synchronized. All subsequent trips in __rw_get_numpunct find the facet data already there and they just read it, no synchronization needed, and return it. Therefore, the cost of initialization/synchronization is paid only once. Thanks. Liviu