httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dean Gaudet <>
Subject Re: another naming question
Date Wed, 24 Dec 1997 10:11:30 GMT
On Tue, 23 Dec 1997, Alexei Kosut wrote:

> On Tue, 23 Dec 1997, Dean Gaudet wrote:
> > No!  not function pointers in structures!  Argh! 
> > 
> > Slow! 
> In what way? Explain to me the difference between receiving the memory
> address of a function from a stored value and then executing it, and
> having that address "written down" somewhere. To me, it sounds like one
> extra check to get the address from the pointer. And I'm pretty sure that
> can't take any appreciable time.

In i386 assembly:

    call $abcdef


    /* assume r->func is at offset 28 from r */
    movl [%eax+$28],%edx
    call *%edx

There is a huge world of difference.

In the first case the cpu can prefetch instructions at the call
destination because it knows the address well before the instruction is

It also doesn't require an extra memory read -- memory reads are a
bottleneck, CPUs are far faster these days than the memory they access.
For example, a 266Mhz Pentium II when it takes an L1 cache miss requires
16 cycles to get the data from its L2 cache.  It requires 40+ cycles
to get the data from RAM (assuming 10ns SDRAM, this is top of the line
x86 hardware I'm talking about).  Optimally a P-II can retire three
instructions per cycle.  In reality it ends up doing closer to 1.8 on
flat 32-bit code.  So it could execute 70 instructions in the time it
takes to read that one function pointer.

Ok that oversimplifies things.  It also has to fetch those instructions
from memory, which also takes memory read cycles.  But my point is that
memory reads are not free.  Instruction prefetching is intended to take
advantage of burst read capabilities of hardware and overlap instruction
execution with the retrieval of instructions.

This isn't an intel specific thing either.  Every single CPU vendor
is struggling with the fact that RAM has not kept up with CPU speeds.
The person that invents inexpensive RAM that pumps 32-bits per cycle at
500Mhz is a rich person, but they're still behind the ball.  500Mhz is
2H1997 technology -- 1Ghz is 1H1999 technology.

> And it can't be *that* slow... As I understand it, that's basically how
> shared libaries are implemented on most systems (or worse, if there are
> string lookups involved) - I know that that's how DLL import libraries
> work, give or take. I mean, if Microsoft does it, it must be right...

Right, shared libraries are slower than static libraries.  Try linking a
libc intensive program -static vs. not -static and time the two, you'll
see the difference.  Shared libraries are just way more convenient and
people generally accept the expense.

> > Why do you think we need that?
> Sometimes I wonder if anyone actually *read* those page-long emails I
> kept writing last summer about my ideas for the Apache 2.0 API. I spent
> hundereds of words on this topic. Certainly no-one responded. Silence
> equals consent. Especially five months of it...

silence != consent.  I'm not even sure what emails you're referring to.
Besides, this is the Apache group you're referring to.  We never agree
on anything, and we especially never agree on code which results from
an accepted proposal.  It's only when there's code in front of our face
that we have to vote on that we get anywhere.

> Basically, I don't want the modules to have to use any symbols from the
> core.  This makes modules much much easier to have as shared libraries -
> especially on Windows, where we can get rid of the DLL import libraries,
> and merge ApacheCore.dll back into Apache.exe. It also makes binary
> compatibility of modules between versions of Apache:

You can't have binary compatibility unless we let the API freeze.  We've
never done that in the past, we've always changed things in subtle ways.
I doubt that we'll be happy with a frozen API.

> If the functions are
> linked ordinally or by address, changing/adding/deleting functions will
> change those numbers, and things will be screwed up.

If a module is compiled with:

    struct foo {
	int (*func1)(...);
	int (*func2)(...);

and later we change to:

    struct foo {
	int (*func2)(...);
	int (*func3)(...);

then the module is broken.  The ordinals are still there, they're just
hidden behind syntactic sugar.

I haven't even mentioned yet the expense of having 20 function pointers
in struct foo {} *for every instance of struct foo*, talk about blowing
your RAM caches.  To deal with that you start behaving like C++ ... you
end up with:

    struct foo_funcs {
	int (*func1)(...);
	int (*func2)(...);

    struct foo {
	struct foo_funcs *funcs;

Which is exactly how C++ implements class functions behind the scenes.
The code for this is even worse.  It looks like:

    /* load r->funcs */
    movl [%eax],%edx
    /* load r->funcs->func2 */
    movl [%edx+4],%edx
    /* call it */
    call *%edx

Two loads and a non-prefetchable control transfer!  Yuck!  At least the
second load is highly likely to be cached because it's shared by a lot
of structures.

> It also lets C++ programmers do neat tricks with syntax, if they write
> modules in C++.

Huh?  So we're making the C ugly and slow just so that C++ programmers
can have "pretty" syntax?

> I'm obviously not the only one who feels this way: A whole lot of APIs
> I've seen lying around work in this manner: ISAPI and JNI are two I've
> used recently.

Yes, it's a very academic approach.  I applaud academia.  But I don't
agree with everything it has to say.  practice != theory.

Anyhow, the big thing I remember from the discussions earlier this summer
was that we'd remove all the function pointers from the module structure
leaving only an init() function.  The init() function would have to call
register_phase(PHASE_NAME, phase_handler) for each of the phases the
module wanted to handle.  This has explicit ordinals, structures with
function pointers have implicit ordinals.  You can't lose the ordinals
unless we started using strings for all the phase names or something
(and if we did that we might as well just implement in java, we'd be
about as slow).

I'm willing to take indirect control transfer in certain places in
the API.  But I really don't understand the need to replace rputs()
with r->rputs().  That is where I'm not following you.


View raw message