stdcxx-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Sebor <se...@roguewave.com>
Subject Re: Benchmarking stdcxx
Date Sun, 12 Feb 2006 23:20:03 GMT
Andrew Black wrote:
>  Greetings all.
> 
> I thought it might be interesting to do some benchmarking, comparing the 
> performance of stdcxx with other standard libraries.  As there are a 
> number of attributes that can be compared when doing a benchmark, and an 
> even larger number of classes that can be looked at, there is a fair 
> amount of choice in what to measure.  As a starting point, I chose to 
> measure the runtime performace of stringstream objects.

Thanks! These are extremely valuable data. They clearly show that the
insertion and extraction of character arrays to and from stringstreams
is slower in stdcxx than in libstdcx++. We need to figure out why,
especially in the most egregious cases.

To help us narrow down the area that we should focus on we should add
a few more test functions. The first one I would add is a new function
exercising just the class ctor:

   static void construct (int N) {
       for (int i = 0; i < N; ++i) {
           std::stringstream sink;
           assert (sink.goodbit == sink.rdstate ());
       }
   }

Assuming the results for just the ctor are comparable (in my
measurements with gcc 4.0.2, stdcxx was actually about 50% faster
than libstdc++ on this test) we can safely eliminate the ctor and
the dtor as the bottlenecks.

Next, I would add and benchmark another function to exercise the
sentry object that gets constructed in every inserter.

   static void ostream_sentry (int N) {
       for (int i = 0; i < N; ++i) {
           std::stringstream sink;
           assert (sink.goodbit == sink.rdstate ());

           const std::ostream::sentry guard (sink);
           assert (true == guard);
       }
   }

If the results of this test are similar as well (in my runs stdcxx
was about 30% faster than libstdc++), we can eliminate the sentry
as the cause of the problem.

As the next step, instead of benchmarking the entire insertion, I
would exercise just the streambuf::sputn() function (which ends up
getting called by our implementation of the inserter).

   static void streambuf_sputn (int N) {
       for (int i = 0; i < N; ++i) {
           std::stringstream sink;
           assert (sink.goodbit == sink.rdstate ());
           const int nput = i % sizeof ldata;
           const int n = sink.rdbuf ()->sputn (ldata, nput);
           assert (n == nput);
       }
   }

In my tests, this function indeed appeared to be the source of the
poor performance (10 times slower than the libstdc++ implementation
of the same).

 From examining the code I knew that sputn() (which calls the virtual
function xsputn()) calls the virtual streambuf member function
overflow(). I measured the performance of overflow but it was the
same for both implementations.

  static void streambuf_overflow (int N) {
       struct pubbuf: std::stringbuf {
           using std::stringbuf::overflow;
       };
       for (int i = 0; i < N; ++i) {
           std::stringstream sink;
           assert (sink.goodbit == sink.rdstate ());
           const int n = ((pubbuf*)sink.rdbuf ())->
               overflow ((unsigned char)i);
           assert (n == (unsigned char)i);
       }
   }

So the source of the performance problem seems to be in xsputn() or
in its interaction with overflow. To narrow it down even more, I
exercised xsputn() with the second argument of 0 (i.e., making it
insert a string of lenght 0). Again, stdcxx is quite a bit faster
in this case, this time by about 20%. Changing the second argument
to 1 brought the two results closer (libstdc++ was a tad faster but
not significantly so). Interestingly, though, increasing the value
of the second argument has a corresponding effect on the slowdown
in stdcxx.

By stepping through the code I noticed that xsputn() would call
overflow() for every character instead of only when the buffer was
full as I expected. It seems that stringstream::overflow() doesn't
make any put area available. Looking at the function more closely
revealed a bug in the put area pointer manipulation. Quickly fixing
the bug eliminated much of the performance problem. stdcxx is now
essentially comparable (although on average still 60% slower) to
libstdc++. I suspect that the remaining difference is due to the
allocation policy used by stdcxx stringstream (128 characters
initial buffer size with a growth factor of 1.6 or so).

I created http://issues.apache.org/jira/browse/STDCXX-142 to track
this issue.

I'll have to test my quick fix but assuming it doesn't cause any
regressions I'll commit it on trunk. It would be good if you could
rerun your benchmarks (with the enhancements suggested above) and
post new results when the fix is available.

Btw., it would also be very nice to put together a harness (e.g.,
in the form of a portable shell script) that would run each test
some number of times and produced a table with the results on
output. That way we could easily rerun the whole benchmark and
quickly post new results after each change.

Again, thanks for doing this, it's very helpful!
Martin

Mime
View raw message