incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Towards a stable C API... via indirect dispatch
Date Sun, 28 Oct 2007 15:22:14 GMT

Boilerplater is currently implemented in KS using the design that  
Dave and I hashed out here.  The post below explores a potential mod  
to that design.

Marvin Humphrey
Rectangular Research

Begin forwarded message:

From: Marvin Humphrey <>
Date: October 27, 2007 7:43:35 PM PDT
To: KinoSearch discussion forum <>
Subject: [KinoSearch] Towards a stable C API... via indirect dispatch
Reply-To: KinoSearch discussion forum <>


In order to present a useful public C API for KS, we need to make  
method calls available -- not just functions.  But in KS, inheritance  
is implemented using vtables -- structs with function pointer members  
-- and once those vtables are part of the public API, you can't  
change the vtable struct layout without wrecking binary  
compatibility.  Here is an excellent explanation of the problem:


Freezing the vtables would severely cramp our ability to develop KS.   
However, if we are unable to guarantee binary compatibility, outside  
developers will be severely limited in their ability to extend KS  
from C, so I've been looking for a way around this problem for a  
while... Happily, I think I've found one:

   "Supporting Binary Compatibility with Static Compilation"[1]
   Dachuan Yu, Zhong Shao, and Valery Trifonov

Right now, KS virtual method invocations look something like this:


Here's the actual pound-define for KinoSearch::Index::Term's destroy 
() method, which overrides a method inherited from  

   #define Kino_Term_Destroy(self) \

self->_ is the vtable; "destroy" is a member.

Under the indirect dispatch system, the vtable becomes an array of  
function pointers rather than a struct with function pointer members,  
and method invocation changes to something like this:


Here's how an actual pound-define might look:

   #define Kino_Term_Destroy(self) \

What this allows us to do is define the vtable layout and the offsets  
dynamically during a bootstrap operation.  The payoff is that a  
method macro so defined retains binary compatibility even as the  
composition of the vtable changes with subsequent releases.

Stated another way: if we make the layout of the current vtables part  
of the public API, externally compiled code will assume that a method  
like "destroy" is located at a fixed location in the vtable -- and if  
the layout of the vtable changes, the externally compiled code will  
jump into the wrong method. (BAD!)  However, if we make that offset a  
variable and set it at runtime, the externally compiled code will  
always find the correct method to jump into.

Therefore, someone could write another XS library extending KS, and  
upgrading KS itself wouldn't cause breakage.

There's a cost in CPU cycles for this flexibility: one extra array  
look-up operation.  However, GCJ uses this design, and the  
performance penalty is apparently only around 2% on average:


That might seem mild, but it actually makes sense to me, at least.    
On a modern, pipelining processor chip, that extra op just isn't a  
big deal.  When I changed InStream and OutStream into "final"  
classes, so that heavily used methods like OutStream_Write_VInt()  
resolved directly to function addresses and no longer needed to be  
resolved via vtable double dereference, the benchmark barely budged:

   <> (Link to

A note about type safety:

The array of function pointers will have to be implemented as an  
array of void*, since we won't know which functions go where in the  
vtable until runtime.  This would seem to be a drawback, since in  
theory we lose a certain amount of compile-time checking.  However,  
we aren't really losing much, if anything.   The current system  
doesn't perform real type checking; the first argument is always cast  
(in this example, to kino_Obj*):

   #define Kino_Term_Destroy(self) \

However, at present, there *will* be a compile time error if the  
vtable doesn't contain a method with the appropriate name:

    /* compile-time error */
    kino_Obj_destroy_t destroy_meth = self->_->destro;

We will continue to enjoy a similar level of safety because the name  
of the offset variable will have to be resolved by the dynamic  
loader.  Say we remove the Kino_Term_Destroy method... then this code  
will crash at run-time, because the kino_Term_destroy_OFFSET symbol  
cannot be resolved:

    destroy_meth = self->_[kino_Term_destroy_OFFSET];

Of course a run-time crash would be bad -- but that just means that  
we can't redact public methods -- which we wouldn't be doing anyway.

Marvin Humphrey
Rectangular Research

[1] The technique I've out differs slightly from what's described in  
the paper.  For us the offsets can be stored in individual variables,  
but Yu et al put them in an "otable" array which is initialized by  
the Java class loader.

KinoSearch mailing list

View raw message