incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: [Lucy] Passing strings from Host to Lucy
Date Tue, 01 Sep 2009 21:59:25 GMT
On Tue, Sep 01, 2009 at 02:29:19PM -0500, Peter Karman wrote:
> Marvin Humphrey wrote on 09/01/2009 10:58 AM:
> >The one thing that bothers me about this scheme is that it forces us to
> >publish the struct definition of ZombieCharBuf.  Ordinarily, class struct
> >definitions will be opaque -- but the compiler needs to know at least
> >sizeof(ZombieCharBuf) in order to allocate the proper amount of stack
> >memory.
> What alternatives are there to publishing the struct? 

Well, there's maybe a hackish alternative: publish a fake struct definition
for a string argument type that lies about its contents.  Something like this:
    typedef struct lucy_StringArg {
        void* junk[SIZEOF_CHARBUF_PLUS_EXTRA / sizeof(void*)];
    } lucy_StringArg;

If we then implement lucy_StringArg_make_str as an opaque function, it can put
whatever we feel like into that chunk of memory.  Then the only constraint
going forwards is that we can't exceed the size we committed to.

But thinking about it further, I think we can publish the ZombieCharBuf struct
definition internally for the binding code, while avoiding exposure via public
header files.

> Isn't that pretty typical, to make some public .h file available with those
> kind of defs in it? 

Sometimes struct definitions get published, sometimes structs are opaque.
Both options are common.

Keeping structs opaque encourages loose coupling between modules, which yields
better encapsulation and more maintainable code.  The way Boilerplater is set
up now, the default is to have opaque structs -- including internally.

I think it's important to eat our own dog food.  If we set ourselves the
constraint that stuff needs to be fast without requiring direct struct member
access, we'll end up with public APIs that are fast without requiring direct
struct member access.

> Or are you worried that if the struct changes internally, that existing code
> might have abused it and will croak or worse on Lucy upgrade? 

Yes.  I'm worried that unless we consciously avoid direct struct member access
ourselves, we'll create designs that all but force people who want to write
fast extensions to do the same.  Then, when we change the struct definition
and a Lucy core upgrade "breaks" sombody's app, they'll blame us.  We'll
respond, "but you relied on a non-public API", and they'll come back with "but
that's the only way to write something that performs well!"

It won't be possible -- or desirable -- to have all struct definitions be
private, but deny-by-default is a good starting point.

> In that latter case, I think following the Perl course of "what 
> is documented is what is guaranteed" seems fair. 

Yes, I agree.  That should be our policy.

> Something like:
>  ZombieCharBuf *zcb = malloc(sizeof(ZombieCharBuf));
> is about all any host language can expect to work.

It's not so much the host language bindings that I'm concerned about.  I think
that even if they start elsewhere, in time, canonical bindings for Lucy will
end up in the Apache repository, where we'll be able to compensate for our own
stupidity. :)  

I'm more thinking ahead to potential consumers of the C API -- like SWISH.  I
want SWISH to be able to use Lucy effectively without needing to know struct

> On the issue of names, why not LucyCharBuf?

In the first rev, the class hierarchy will look like this.

    class name                  full struct name
    Lucy::Obj                   lucy_Obj
    Lucy::Obj::CharBuf          lucy_CharBuf
    Lucy::Obj::ViewCharBuf      lucy_ViewCharBuf
    Lucy::Obj::ZombieCharBuf    lucy_ZombieCharBuf

However, that doesn't take into account how to handle different Unicode

We have a single-inheritance model design problem.  The first-rev class
hierarchy above is organized around read-only-ness.  Alternatively, we might
want to organize it around encoding, so that CharBuf is an abstract base
class, with CharBuf8, CharBuf16, and optionally CharBuf32 as subclasses; under
that scheme, read-only-ness would be tracked via flags.

I'd hoped to solve that problem before submitting CharBuf to Lucy; that's not
going to happen, but I did manage to banish all direct struct member access
for CharBuf virtually everywhere in the KinoSearch hierarchy.  It's doable --
without compromising performance.

Marvin Humphrey

View raw message