Craig Chariton wrote:
> I saw something unusual with the Apache library concerning a timing
> comparison against the MSVC 7.1 native Standard library. It may be
> insignificant or something that is already known and remedied, but I
> can't
> find an email referencing the problem or possible solutions.
[...]
> Here is the main code for the program:
>
>
>
> //The main work is done here, then see what the time is.
>
> typedef std::vector<int, std::allocator<int> >
> Vector;
>
> typedef std::ostream_iterator<int, char, std::char_traits<char> >
> Iter;
>
>
>
> // Initialize a vector using an array of integers.
>
> const Vector::value_type a[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
> 12,
> 13, 14, 15 };
>
> Vector v (a + 0, a + sizeof a / sizeof *a);
>
>
>
> // Create an empty vectors to store results.
>
> Vector sums (Vector::size_type (15));
>
> Vector prods (Vector::size_type (15));
>
>
>
> // Compute partial_sums and partial_products.
>
> for (i = 0; i < repetitions; i++){
>
> std::partial_sum (v.begin (), v.end (), sums.begin ());
>
> std::partial_sum (v.begin (), v.end (), prods.begin (),
>
> std::multiplies<Vector::value_type>());
>
> }
>
>
>
> I don't know if I should be concerned with this delay or not. Is this
> delay
> significant enough to investigate?
It depends on what you mean by "concerned" and "significant."
The algorithm isn't used by any other component in the library
and its performance is unlikely to have a dramatic effect on
the efficiency of your average C++ program.
But if your goal is to understand what's causing the performance
difference I would suggest to start by exercising no more than
one function at a time (instead of two), increasing the size of
the vectors (by at least two orders of magnitude), reducing the
number of iterations to avoid the cost of function calls (the
invocation of the algorithm and any helper functions called by
them, including the passing of function arguments, stack creation,
as well as inlining) from skewing the timings. That way you'll be
(hopefully) able to reduce the amount of code you'll need to look
at later and get a more accurate indication of the difference in
performance.
Once you've determined which of the two algorithms is the slower
one (it might be both) and by how much, you can either continue
to drill down by working with the translation unit (compile the
test with E) and playing with the code, or you can take a look
at the disassembly for each of the algorithms to see if you can
spot something unusual (function calls that should be inlined
and aren't, creating unnecessary temporaries, etc.)
Martin
