incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Balmain" <>
Subject Re: Creation and Destruction
Date Fri, 13 Oct 2006 06:21:39 GMT
On 10/13/06, Marvin Humphrey <> wrote:
> Greets,
> I've been working on the problem of on generalizing native object
> wrapping, in the hopes that we can apply the same set of C functions/
> macros regardless of what language we're targeting -- but define them
> differently for each binding.
> Perl manages memory using reference counting.  When a Perl SV*
> (that's scalar value, Perl's basic data type) has its reference count
> drop to 0 (most commonly because a lexical variable goes out of
> scope) it gets reclaimed.  If that scalar references any other Perl
> data structures, their reference counts get decremented at that point.
>     {
>         my $foo = "foo yoo"; # $foo has a refcount of 1
>         {
>              my $foo_ref = \$foo; # now $foo a refcount of 2
>         } # $foo_ref's count drops to 0 and it gets reclaimed
>         # now $foo has a refcount of 1 again
>     } # $foo's refcount drops to 0 and it gets reclaimed.
> If the scalar is an object, the object's DESTROY method is called.
>      {
>          my $foo = Foo->new;
>      } # $foo->DESTROY gets called.
> Usually, Perl just cleans up its own data structures and there's no
> need to write a specific DESTROY method.  However, it doesn't know
> how to deal with things like foreign C structs wrapped in Perl
> objects, so DESTROY is where custom cleanup stuff goes.
> One problem with objects that have to be accessible from both Perl
> and C is that if a C struct has to be wrapped in a Perl object so
> that it can travel trough Perl-space, it's hard to stop it from being
> destroyed when it leaves Perl space.  Say that we have a TokenBatch
> struct, wrapped in a Perl object, which holds a bunch of little Token
> structs.  Say we want to do something like this...
>       while ( $token_batch->next ) {
>           my $token = $token_batch->get_token;
>           transform_token_somehow($token);
>       }
> We'll have to wrap each Token in a Perl object as it leaves the
> TokenBatch.  Then, when $token goes out of scope, $token->DESTROY
> will get called.  We don't really want that to happen, though -- the
> TokenBatch isn't done with the Token struct yet.
> The solution for most cases is to assign every struct a Perl object
> at creation time if the struct will have to pass through Perl space
> at some point, and to keep track of that native object by assigning
> it to a struct member.
>      self->ref = lucy_Obj_create_ref(self, class_name);
> Historically, such functionality has resided in KinoSearch's XS
> wrapper code and all constructors have been called from Perl space.
> But as I've been Ferret-izing KinoSearch, I've been writing more and
> more C code, and it's becoming desirable to have C constructors call
> C constructors.  That means some native objects have to be created
> from C space.
> It's possible to quarantine all the actual perlapi C routines in a
> single module.  Let's say it's Lucy/Util/Object.  Object.h might
> declare these functions:
>   /* Create a native ref with a refcount of 1.
>   */
> void*
> lucy_Obj_create_ref(void *ptr, const char *class);
> /* Decrement an opaque reference's native refcount.
>   */
> void
> lucy_Obj_refcount_dec(void *ref);
> The Perl implementation in Object.c probably looks like this:
> #ifdef LUCY_PERL
> void*
> lucy_Obj_create_ref(void *ptr, const char *class)
> {
>      /* I'll explain what this means some other time. */
>      SV *obj_sv = newSViv( PTR2IV(ptr) );
>      HV *const stash = gv_stashpv(class, true);
>      SvRV_set( (SV*)Obj_scratch_ref, obj_sv );
>      sv_bless( (SV*)Obj_scratch_ref, stash );
>      return (void*)obj_sv;
> }
> void
> lucy_Obj_refcount_dec(void *ref)
> {
>      SvREFCNT_dec( (SV*)ref );
> }
> #endif /* LUCY_PERL */
> That scheme is working fine in KinoSearch, but there's a problem
> which is getting in the way of full generalization.  Perl has to know
> the classname, and it spells it using double colons as separators.
>      instream->ref = lucy_Obj_create_ref(instream,
> "Lucy::Store::Instream");
> That string isn't going to be useful in other implementations.  If
> all object creation is handled from native space, no big deal,
> because the calls to lucy_Obj_refcount_dec all happen from the XS
> binding code, not from the C modules.  But once you get C
> constructors calling C constructors, you have problems.
> For now in KinoSearch, I'm just pressing ahead and typing in the
> class names as Perl expects them, but we'll have to solve this
> problem for Lucy.
> Does Ruby have analogous quirks for object creation and destruction?

Hi Marvin,

I don't know how much of Ferret's binding code you've looked at but
you may have noticed a lot of my structs have a ref_cnt variable. That
way, every time a struct gets wrapped in a ruby object, it's ref_cnt
is incremented. When the object goes out of scope and is garbage
collected the ref_cnt is decremented (and the object deleted if
ref_cnt = 0). I think as far as supporting multiple languages like we
are attempting to do with Lucy, this is the easiest way to go, since
it should work for all languages.

The bigger problem I had was when Ruby objects are going into C space.
What happens when all references are removed in Ruby and the object
gets deleted while it is still being used in C space. Actually, I
think reference counting languages make this a lot easier to deal
with, although I just discovered recently I can deregister an object
for garbage collection in Ruby which is very helpful.

> How about other languages?

The only other language I've actually written an extension for is
Python which is reference counted like Perl. As far as I know, so is
PHP. Lua and IO are mark-and-sweep interpreters like Ruby. I think
that is basically what it is going to come down to; the garbage
collection algorithm used.

> Another problem we have to solve is how to make memory management
> work under multiple systems.  We can't really do mark-and-sweep with
> Perl, because Perl doesn't provide an event we can cue collection off
> of.  We'd have to write our own tracing garbage collector, our own
> malloc() and free(), and manage our own memory pool -- yikes.
> So... how can we make this work everywhere?  How would you implement
> lucy_Obj_create_ref and lucy_Obj_refcount_dec?

If I really had to do it this way lucy_Obj_create_ref would add the
object to a giant hash-table which keeps track of reference counts and
is visible to Ruby's garbage collector .  lucy_Obj_refcount_dec would
decrement the reference count in the hash-table and delete the object
for the hash-table if the ref-count is 0.


PS: you may have read on the Java mailing list that I'm working on
building an object database with Lucene-like search capabilities. This
is what people really need in the Ruby community since most people are
just using Ferret to add full-text search to their database. Everyone
seems to be struggling with trying to keep their indexes in synch with
the database, not to mention the performance implications of having a
separate index and data-store. This also makes it much easier to have
multiple processes adding data to the index. And creating bindings
will be greatly simplified because the binding simply need to know
about an SQL like query language and how to turn the results sets into
the correct objects for the language in question.

Obviously there are downsides to this solution, the biggest being
extensibility in the native language which is why I am still going to
go ahead with Lucy development.

View raw message