apr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lucian Adrian Grijincu" <lucian.griji...@gmail.com>
Subject Re: [PATCH] vformatter cleanups (related to PR 42250)
Date Fri, 27 Apr 2007 07:04:49 GMT
On 4/27/07, Davi Arnaut <davi@haxent.com.br> wrote:
> On 27/04/2007, at 00:14, Lucian Adrian Grijincu wrote:
>
> > in apr-conv10-faster.patch you added:
> >
> > static const char digits[] = "0123456789";
> > *--p = digits[magnitude % 10];
> >
> > Why is this faster than:
> > *--p = (char) '0' + (magnitude % 10); ?
>
> You have to take into account the entire loop. The fowling:
>
> do {
>         u_widest_int new_magnitude = magnitude / 10;
>         *--p = (char) (magnitude - new_magnitude * 10 + '0');
>         magnitude = new_magnitude;
> } while (magnitude);
>
> against:
>
> do {
>         *--p = digits[magnitude % 10];
> } while ((magnitude /= 10) != 0);
>
> digits is easily cacheable, fewer assignments.
>
I wasn't promoting the first version, I wanted to say the second one
could be made better this way:
 do {
         *--p = (char) '0' + (magnitude % 10);
 } while ((magnitude /= 10) != 0);

I don't quite like u_widest_int from the first version from a
portability standpoint, but I see that it's quite used in
conv_10_quad() and the rest :(
http://svn.apache.org/viewvc/apr/apr/trunk/strings/apr_snprintf.c?view=markup

--
Lucian Adrian Grijincu
> >
> > For your "faster" version, under the hood, the C compiler adds
> > (magnitude % 10) to the address of digits and then copies the contents
> > of the memory location represented by the sum's result into *--p.
> >
> > My version just adds (magnitude % 10) to '0' and stores the result
> > in *--p.
>
> Talk is cheap, let's benchmark! To see the generated assembly:
>
> gcc -O2 -o bench bench.c -g
> objdump -S bench > bench-asm
>
> # Intel(R) Celeron(R) CPU 2.20GHz
> [davi@montefiori ~]$ gcc -o bench bench.c -O2 # uint32_t
> [davi@montefiori ~]$ ./bench $RANDOM
> conv_1
> cycles:          236
> cycles:          236
> cycles:          236
> cycles:          236
> cycles:          236
> cycles:          236
> cycles:          236
> cycles:          236
> conv_2
> cycles:          236
> cycles:          220
> cycles:          224
> cycles:          220
> cycles:          224
> cycles:          224
> cycles:          224
> cycles:          224
> conv_1
> cycles:          236
> cycles:          236
> cycles:          236
> cycles:          236
> cycles:          236
> cycles:          236
> cycles:          236
> cycles:          236
> conv_2
> cycles:          220
> cycles:          224
> cycles:          224
> cycles:          224
> cycles:          224
> cycles:          224
> cycles:          224
> cycles:          224
>
> [davi@montefiori ~]$ gcc -o bench bench.c -O2 # uint64_t
> [davi@montefiori ~]$ ./bench $RANDOM |more
> conv_1
> cycles:          508
> cycles:          532
> cycles:          540
> cycles:          468
> cycles:          468
> cycles:          468
> cycles:          468
> conv_2
> cycles:         1188
> cycles:          824
> cycles:          896
> cycles:          828
> cycles:          824
> cycles:          824
> cycles:          820
> conv_1
> cycles:          524
> cycles:          492
> cycles:          468
> cycles:          504
> cycles:          468
> cycles:          504
> cycles:          468
> conv_2
> cycles:          768
> cycles:          836
> cycles:          836
> cycles:          820
> cycles:          820
> cycles:          820
> cycles:          820
>
>
> > Am I missing something here?
>
> Both code, after compiler optimizations, yield similar results but
> hurts uint64_t (apr_uint64_t) case quite a bit. "Faster" was a
> overstatement, I withdraw apr-conv10-faster.patch.
>
> --
> Davi Arnaut
>
>
>

Mime
View raw message