incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: [lucy-dev] Ruby allocation/initialization
Date Sat, 07 Jan 2012 20:47:33 GMT
On Thu, Jan 05, 2012 at 11:33:12AM -0800, Logan Bell wrote:
> In regard to the allocation function and the need to create an empty object
> has had me digging a bit more in the pickaxe book. The allocator is only
> needed "if the object you’re implementing doesn’t use any data other than
> Ruby instance variables, then you don’t need to write an allocation
> function—Ruby’s default allocator will work just fine. " If I understand
> that correctly, since our (Clownfish::CFC::Hierarchy) object does need data
> then we need to allocate the space up front in the allocator function.
> Further it goes on to outline reasons why this is necessary ( marshaling as
> you pointed out being one of them ):
> "One of the reasons for this multistep object creation protocol is that it
> lets the interpreter handle situations where objects have to be created by
> “back-door means.” One example is when objects are being deserialized from
> their marshaled form. Here, the interpreter needs to create an empty object
> (by calling the allocator), but it cannot call the initializer (because it
> has no knowledge of the parameters to use). Another common situation is
> when objects are duplicated or cloned."
> It might be worth doing some code diving on the ruby end to see for sure,
> but I can see value in in having constructors that accept no arguments.

Clownfish actually provides a direct analogue to Ruby's Class#allocate:

    /** Create an empty object of the type defined by the VTable: allocate,
     * assign its vtable and give it an initial refcount of 1.  The caller is
     * responsible for initialization.
    Make_Obj(VTable *self);

For an example of how VTable_Make_Obj() is used during deserialization, here's
Freezer_thaw() from core/Lucy/Util/Freezer.c:

    Freezer_thaw(InStream *instream) {
        CharBuf *class_name
            = CB_Deserialize((CharBuf*)VTable_Make_Obj(CHARBUF), instream);
        VTable *vtable = VTable_singleton(class_name, NULL);
        Obj *blank = VTable_Make_Obj(vtable);
        return Obj_Deserialize(blank, instream);

Freezer_thaw() obtains the class name, uses it to look up the right VTable
singleton, then invokes VTable_Make_Obj() to create the blank object.  The
newborn blank object doesn't start off with much, but at least it has a VTable
-- so we can invoke the Deserialize() object method on it and flesh it out.

We also use VTable_Make_Obj() for every Lucy object that we create from
Perl-space.  Our Foo_new() C functions have a limitation: they do not take a
class name as an argument, so they cannot support dynamic subclassing.  For
instance, here is Normalizer_new():

    Normalizer_new(const CharBuf *form, bool_t case_fold, bool_t strip_accents) {
        Normalizer *self = (Normalizer*)VTable_Make_Obj(NORMALIZER);
        return Normalizer_init(self, form, case_fold, strip_accents);

Because the VTable is hard-coded to NORMALIZER, objects created via
Normalizer_new() will *always* have a class of "Lucy::Analysis::Normalizer".
But what if you create a Perl subclass of Lucy::Analysis::Normalizer called

    package MyNormalizer;
    use base qw( Lucy::Analysis::Normalizer );

    my $normalizer = MyNormalizer->new;

Here's how Normalizer_new() would need to change in order to support such

    Normalizer_new(CharBuf *class_name, const CharBuf *form,
                   bool_t case_fold, bool_t strip_accents) {
        VTable *vtable = VTable_singleton(class_name, NULL);
        Normalizer *self = (Normalizer*)VTable_Make_Obj(vtable);
        return Normalizer_init(self, form, case_fold, strip_accents);

The actual code which *does* support subclassing for Normalizer is spread
across three functions, two of which I've included below my sig for reference:

  * XSBind_new_blank_obj() from perl/xs/XSBind.c, which wraps
  * XS_Lucy_Analysis_Normalizer_new() from Lucy.xs, which is auto-generated.
  * Normalizer_init(), from core/Lucy/Analysis/Normalizer.c.

In order to support dynamic subclassing in the Ruby bindings for Lucy, we will
need to provide similar functionality.

However, I question whether we need to provide that kind of functionality for
Clownfish::CFC, which is itself written using a much cruder object model:

  * No support for subclassing.
  * No support for serialization.
  * No support for Ruby's #clone or #dup methods.

I don't yet understand why Ruby *needs* an allocator function if we aren't
going to use those bells and whistles.  How many C libraries out there provide
two-stage constructors?  It doesn't make sense that Ruby would impose such an
esoteric requirement, limiting the kinds of C libraries you could write Ruby
bindings for.

Something like this ought to work:

    // Clownfish::CFC::Hierarchy#new
    static VALUE
    S_CFCHierarchy_new(VALUE klass, VALUE source_rb, VALUE dest_rb) {
        const char *source = StringValuePtr(source_rb);
        const char *dest   = StringValuePtr(dest_rb);
        CFCHierarchy *self = CFCHierarchy_new(source, dest);
        return Data_Wrap_Struct(klass, NULL, NULL, self);

    // Bootstrap Clownfish::CFC::Hierarchy.
    static void
    S_Init_CFCHierarchy() {
        cHierarchy  = rb_define_class_under(mCFC, "Hierarchy", rb_cObject);
        rb_define_method(cHierarchy, "build", S_CFCHierarchy_build, 0);
        rb_define_singleton_method(cHierarchy, "new", S_CFCHierarchy_new, 2);

    // Bootstrap Clownfish::CFC and all of its components.
    Init_CFC() {
        mClownfish = rb_define_module("Clownfish");
        mCFC       = rb_define_module_under(mClownfish, "CFC");

I don't know whether that's an idiomatic approach for writing a Ruby extension,
but if it works, it prevents us from having to add a bunch of CFCFoo_allocate()
functions and from having to provide two-stage constructors for every
Clownfish::CFC component.

In any case, exploring this topic for the CFC bindings helps us to understand
the issues we will confront when auto-generating Ruby wrapper code via the
as-yet-to-be-written Clownfish::CFC::Binding::Ruby.  :)

Marvin Humphrey

XSBind_new_blank_obj(SV *either_sv) {
    cfish_VTable *vtable;

    // Get a VTable.
    if (sv_isobject(either_sv)
        && sv_derived_from(either_sv, "Lucy::Object::Obj")
       ) { 
        // Use the supplied object's VTable.
        IV iv_ptr = SvIV(SvRV(either_sv));
        cfish_Obj *self = INT2PTR(cfish_Obj*, iv_ptr);
        vtable = self->vtable;
    else {
        // Use the supplied class name string to find a VTable.
        STRLEN len;
        char *ptr = SvPVutf8(either_sv, len);
        cfish_ZombieCharBuf *klass = CFISH_ZCB_WRAP_STR(ptr, len);
        vtable = cfish_VTable_singleton((cfish_CharBuf*)klass, NULL);

    // Use the VTable to allocate a new blank object of the right size.
    return Cfish_VTable_Make_Obj(vtable);

XS(XS_Lucy_Analysis_Normalizer_new) {
    if (items < 1) { CFISH_THROW(CFISH_ERR, "Usage: %s(class_name, ...)",  GvNAME(CvGV(cv)));
    SP -= items;

    const lucy_CharBuf* normalization_form = NULL; 
    chy_bool_t case_fold = true; 
    chy_bool_t strip_accents = false;
    chy_bool_t args_ok = XSBind_allot_params(
        &(ST(0)), 1, items, "Lucy::Analysis::Normalizer::new_PARAMS",
        ALLOT_OBJ(&normalization_form, "normalization_form", 18, false, LUCY_CHARBUF,
        ALLOT_BOOL(&case_fold, "case_fold", 9, false),
        ALLOT_BOOL(&strip_accents, "strip_accents", 13, false),
    if (!args_ok) {
    lucy_Normalizer* self = (lucy_Normalizer*)XSBind_new_blank_obj(ST(0));

    lucy_Normalizer* retval = lucy_Normalizer_init(self, normalization_form, case_fold, strip_accents);
    if (retval) {
        ST(0) = (SV*)Cfish_Obj_To_Host((cfish_Obj*)retval);
    else {
        ST(0) = newSV(0);

View raw message