incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Creation and Destruction
Date Fri, 13 Oct 2006 04:46:50 GMT

I've been working on the problem of on generalizing native object  
wrapping, in the hopes that we can apply the same set of C functions/ 
macros regardless of what language we're targeting -- but define them  
differently for each binding.

Perl manages memory using reference counting.  When a Perl SV*  
(that's scalar value, Perl's basic data type) has its reference count  
drop to 0 (most commonly because a lexical variable goes out of  
scope) it gets reclaimed.  If that scalar references any other Perl  
data structures, their reference counts get decremented at that point.

        my $foo = "foo yoo"; # $foo has a refcount of 1
             my $foo_ref = \$foo; # now $foo a refcount of 2
        } # $foo_ref's count drops to 0 and it gets reclaimed
        # now $foo has a refcount of 1 again
    } # $foo's refcount drops to 0 and it gets reclaimed.

If the scalar is an object, the object's DESTROY method is called.

         my $foo = Foo->new;
     } # $foo->DESTROY gets called.

Usually, Perl just cleans up its own data structures and there's no  
need to write a specific DESTROY method.  However, it doesn't know  
how to deal with things like foreign C structs wrapped in Perl  
objects, so DESTROY is where custom cleanup stuff goes.

One problem with objects that have to be accessible from both Perl  
and C is that if a C struct has to be wrapped in a Perl object so  
that it can travel trough Perl-space, it's hard to stop it from being  
destroyed when it leaves Perl space.  Say that we have a TokenBatch  
struct, wrapped in a Perl object, which holds a bunch of little Token  
structs.  Say we want to do something like this...

      while ( $token_batch->next ) {
          my $token = $token_batch->get_token;

We'll have to wrap each Token in a Perl object as it leaves the  
TokenBatch.  Then, when $token goes out of scope, $token->DESTROY  
will get called.  We don't really want that to happen, though -- the  
TokenBatch isn't done with the Token struct yet.

The solution for most cases is to assign every struct a Perl object  
at creation time if the struct will have to pass through Perl space  
at some point, and to keep track of that native object by assigning  
it to a struct member.

     self->ref = lucy_Obj_create_ref(self, class_name);

Historically, such functionality has resided in KinoSearch's XS  
wrapper code and all constructors have been called from Perl space.   
But as I've been Ferret-izing KinoSearch, I've been writing more and  
more C code, and it's becoming desirable to have C constructors call  
C constructors.  That means some native objects have to be created  
from C space.

It's possible to quarantine all the actual perlapi C routines in a  
single module.  Let's say it's Lucy/Util/Object.  Object.h might  
declare these functions:

  /* Create a native ref with a refcount of 1.
lucy_Obj_create_ref(void *ptr, const char *class);

/* Decrement an opaque reference's native refcount.
lucy_Obj_refcount_dec(void *ref);

The Perl implementation in Object.c probably looks like this:

#ifdef LUCY_PERL

lucy_Obj_create_ref(void *ptr, const char *class)
     /* I'll explain what this means some other time. */
     SV *obj_sv = newSViv( PTR2IV(ptr) );
     HV *const stash = gv_stashpv(class, true);
     SvRV_set( (SV*)Obj_scratch_ref, obj_sv );
     sv_bless( (SV*)Obj_scratch_ref, stash );
     return (void*)obj_sv;

lucy_Obj_refcount_dec(void *ref)
     SvREFCNT_dec( (SV*)ref );

#endif /* LUCY_PERL */

That scheme is working fine in KinoSearch, but there's a problem  
which is getting in the way of full generalization.  Perl has to know  
the classname, and it spells it using double colons as separators.

     instream->ref = lucy_Obj_create_ref(instream,  

That string isn't going to be useful in other implementations.  If  
all object creation is handled from native space, no big deal,  
because the calls to lucy_Obj_refcount_dec all happen from the XS  
binding code, not from the C modules.  But once you get C  
constructors calling C constructors, you have problems.

For now in KinoSearch, I'm just pressing ahead and typing in the  
class names as Perl expects them, but we'll have to solve this  
problem for Lucy.

Does Ruby have analogous quirks for object creation and destruction?

How about other languages?

Another problem we have to solve is how to make memory management  
work under multiple systems.  We can't really do mark-and-sweep with  
Perl, because Perl doesn't provide an event we can cue collection off  
of.  We'd have to write our own tracing garbage collector, our own  
malloc() and free(), and manage our own memory pool -- yikes.

So... how can we make this work everywhere?  How would you implement  
lucy_Obj_create_ref and lucy_Obj_refcount_dec?

Marvin Humphrey
Rectangular Research

View raw message