stdcxx-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martin Sebor (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (STDCXX-914) sstream ctors inefficient in reentrant modes
Date Sun, 09 Nov 2008 00:10:45 GMT

    [ https://issues.apache.org/jira/browse/STDCXX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646021#action_12646021
] 

sebor edited comment on STDCXX-914 at 11/8/08 4:09 PM:
--------------------------------------------------------------

Here's a superficially tested patch to optimize {{\_\_rw_locale::_C_is_managed()}} and  {{\_\_rw_locale::_C_manage()}}
in {{[src/locale_body.cpp|http://svn.eu.apache.org/viewvc/stdcxx/trunk/src/locale_body.cpp?revision=651334&view=markup]}}.
It improves the performance of the test case by about 45% (down from 18.905s to 12.147s on
an Intel Core 2 6600 running at 2.40GHz) by having  {{\_\_rw_locale::_C_is_managed()}} avoid
expensive tests for named faces in the "C" locale and by using a more efficient way to detect
the classic locale in {{\_\_rw_locale::_C_manage()}} when invoked from {{locale::~locale()}}.
\\
\\
{noformat}
Index: src/locale_body.cpp
===================================================================
--- src/locale_body.cpp (revision 712407)
+++ src/locale_body.cpp (working copy)
@@ -859,7 +859,22 @@
         return tmp;
     }
 
+    if (plocale && plocale == classic) {
+        // optimize the "destruction" of the classic C locale
+        // the object is never destroyed and its reference count
+        // never drops to 0
+        _RWSTD_ASSERT (__rw_is_C (locname));
+        _RWSTD_ASSERT (__rw_is_C (plocale->_C_name));
 
+        const size_t ref =
+            _RWSTD_ATOMIC_PREDECREMENT (plocale->_C_ref, false);
+
+        _RWSTD_ASSERT (ref + 1U != 0);
+        _RWSTD_UNUSED (ref);
+
+        return 0;
+    }
+
     // re-entrant to protect static local data structures
     // (not the locales themselves)
     _RWSTD_MT_STATIC_GUARD (_RW::__rw_locale);
@@ -1066,6 +1081,15 @@
             return false;
         }
 
+        _RWSTD_ASSERT (0 == _C_usr_facets);
+
+        if (_C_all == _C_std_facet_bits && 0 == _C_byname_facet_bits) {
+            // optimized for the C locale
+            _RWSTD_ASSERT (__rw_is_C (_C_name));
+
+            return true;
+        }
+
         // unless all facets in the same category come either from
         // the C locale or from some named locale the locale object
         // containing the facets is not managed (this test doesn't
{noformat}

With the patch applied, the top 12 list looks like so:
\\
\\
{noformat}
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 16.70      0.97     0.97 50000000     0.00     0.00  __rw::__rw_locale::_C_manage(__rw::__rw_locale*,
char const*)
 12.57      1.70     0.73 10000000     0.00     0.00  std::basic_istream<char, std::char_traits<char>
>& std::operator>><char, std::char_traits<char>, std::allocator<char>
>(std::basic_istream<char, std::char_traits<char> >&, std::basic_string<char,
std::char_traits<char>, std::allocator<char> >&)
  8.43      2.19     0.49 10000000     0.00     0.00  std::num_put<char, std::ostreambuf_iterator<char,
std::char_traits<char> > >::_C_put(std::ostreambuf_iterator<char, std::char_traits<char>
>, std::ios_base&, char, int, void const*) const
  7.06      2.60     0.41 10000001     0.00     0.00  std::string::operator=(std::string const&)
  6.45      2.98     0.38 10000000     0.00     0.00  std::string lex_cast<std::string,
long>(long const&)
  5.34      3.29     0.31 10000000     0.00     0.00  __rw::__rw_dtoa(char*, unsigned long,
unsigned int)
  4.65      3.56     0.27                             main
  4.30      3.81     0.25 10000000     0.00     0.00  std::basic_ostream<char, std::char_traits<char>
>& __rw::__rw_insert<char, std::char_traits<char>, long>(std::basic_ostream<char,
std::char_traits<char> >&, long)
  3.27      4.00     0.19 10000000     0.00     0.00  std::locale::locale(std::locale const&)
  3.01      4.17     0.18 10000000     0.00     0.00  std::basic_stringbuf<char, std::char_traits<char>,
std::allocator<char> >::str(char const*, unsigned long)
  2.75      4.33     0.16 30000000     0.00     0.00  __rw::__rw_locale::_C_is_managed(int)
const
  2.75      4.49     0.16 30000000     0.00     0.00  std::locale::~locale()
{noformat}


      was (Author: sebor):
    Here's a superficially tested patch to optimize {{__rw_locale::_C_is_managed()}} in {{[src/locale_body.cpp|http://svn.eu.apache.org/viewvc/stdcxx/trunk/src/locale_body.cpp?revision=651334&view=markup]}}.
It improves the performance of the test case by about 25% by avoiding expensive tests for
named faces in the "C" locale.
\\
\\
{noformat}
Index: src/locale_body.cpp
===================================================================
--- src/locale_body.cpp (revision 712407)
+++ src/locale_body.cpp (working copy)
@@ -1066,6 +1066,15 @@
             return false;
         }
 
+        _RWSTD_ASSERT (0 == _C_usr_facets);
+
+        if (_C_all == _C_std_facet_bits && 0 == _C_byname_facet_bits) {
+            // optimized for the C locale
+            _RWSTD_ASSERT (__rw_is_C (_C_name));
+
+            return true;
+        }
+
         // unless all facets in the same category come either from
         // the C locale or from some named locale the locale object
         // containing the facets is not managed (this test doesn't
{noformat}

With the patch applied, the top 10 list looks like so:
\\
\\
{noformat}
Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 24.54      1.45     1.45 50000001     0.00     0.00  __rw::__rw_locale::_C_manage(__rw::__rw_locale*,
char const*)
  9.48      2.01     0.56 10000000     0.00     0.00  std::basic_istream<char, std::char_traits<char>
>& std::operator>><char, std::char_traits<char>, std::allocator<char>
>(std::basic_istream<char, std::char_traits<char> >&, std::basic_string<char,
std::char_traits<char>, std::allocator<char> >&)
  7.11      2.43     0.42 10000001     0.00     0.00  std::money_put<char, std::ostreambuf_iterator<char,
std::char_traits<char> > >::_C_put(std::ostreambuf_iterator<char, std::char_traits<char>
>, int, std::ios_base&, char, char const*, unsigned long, int, char const*, unsigned
long) const
  6.94      2.84     0.41 40000003     0.00     0.00  std::locale::_C_get_std_facet(__rw::__rw_facet::_C_facet_type,
__rw::__rw_facet* (*)(unsigned long, char const*)) const
  6.77      3.24     0.40 10000000     0.00     0.00  std::num_put<char, std::ostreambuf_iterator<char,
std::char_traits<char> > >::_C_put(std::ostreambuf_iterator<char, std::char_traits<char>
>, std::ios_base&, char, int, void const*) const
  5.58      3.57     0.33 10000001     0.00     0.00  __rw::__rw_itoa(char*, unsigned long
long, unsigned int)
  4.91      3.86     0.29 10000000     0.00     0.00  std::basic_ostream<char, std::char_traits<char>
>& __rw::__rw_insert<char, std::char_traits<char>, long>(std::basic_ostream<char,
std::char_traits<char> >&, long)
  4.57      4.13     0.27                             std::basic_iostream<char, std::char_traits<char>
>::~basic_iostream()
  3.13      4.32     0.19 10000000     0.00     0.00  std::string::replace(unsigned long,
unsigned long, char const*, unsigned long)
  2.88      4.49     0.17 10000000     0.00     0.00  std::string lex_cast<std::string,
long>(long const&)
{noformat}

  
> sstream ctors inefficient in reentrant modes
> --------------------------------------------
>
>                 Key: STDCXX-914
>                 URL: https://issues.apache.org/jira/browse/STDCXX-914
>             Project: C++ Standard Library
>          Issue Type: Improvement
>          Components: 27. Input/Output
>    Affects Versions: 4.1.2, 4.1.3, 4.1.4, 4.2.0, 4.2.1
>            Reporter: Martin Sebor
>            Priority: Critical
>             Fix For: 4.2.2
>
>         Attachments: stdcxx-914-gprof-gcc-4.1.2-12D.txt, stdcxx-914-gprof-gcc-4.3.0-12S.txt
>
>   Original Estimate: 12h
>          Time Spent: 2.5h
>  Remaining Estimate: 9.5h
>
> As discussed in this [thread|http://markmail.org/message/hqlsw5dq23gx7d4o] stream ctors
in thread-safe builds are inefficient due to the initialization of the mutex data member in
every stream, even in those that never use it. As soon as binary compatibility rules permit
it we should remove the mutex and/or defer its initialization until it's needed. It might
be possible to implement the deferred initialization as early as 4.2.2, or maybe 4.3. Complete
removal will need to wait until 5.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message