incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: [KinoSearch] inside-out objects
Date Wed, 21 Nov 2007 01:48:04 GMT

On Nov 20, 2007, at 12:18 PM, Peter Karman wrote:

> Are you finding it makes it easier to do things with XS, C and the
> reference counting?

KS objects under anything other than the new, temporary class  
KinoSearch::Util::Nat maintain their own refcount, separate from  
Perl.  When a Perl object wrapping a KS object has its SvREFCNT fall  
to 0, the DESTROY method which gets called is  
KinoSearch::Util::Obj::DESTROY, which simply decrements the KS  
object's internal refcount rather than invoking Kino_Obj_Destroy(obj).

       kino_Obj *self;

We have to do things that way because there are many KS objects which  
Perl doesn't know about.  For instance, when TopDocCollector's C  
constructor TDColl_new() is invoked, it creates its own HitQueue  
object without telling Perl anything about it.  However, should we  
need to deal with that HitQueue from Perl-space, we have to wrap it  
in a Perl object.  That's what happens here:

       my $hit_queue = $collector->get_hit_queue;
   } # $hit_queue goes out of scope, DESTROY called

Currently, when that $hit_queue goes out of scope, the Perl wrapper  
object gets destroyed.  However, the interior KS HitQueue object must  
not be destroyed, because $collector still needs it.

As a consequence, KS objects can reappear wrapped in several  
different Perl objects, which is rather strange and is probably a bug  
waiting to bite someone.  Here's an example of how things can go  
wrong: cycling through multiple Perl objects doesn't work well with  
the inside-out pattern, because DESTROY gets invoked over and over  
again, necessitating a broken hack like this...

   sub DESTROY {
      my $self = shift;
      if ($self->refcount < 2) {
         delete $inside_out_var{$$self};

That hack doesn't even work reliably because if the last refcount  
gets decremented by KS internally, the Perl DESTROY method will never  
get called and any inside-out vars will leak.

The solution is to cache a Perl object within a KS object, so that  
effectively Perl *does* know about it.  That's the difference between  
Nat and Obj.  Under Nat, the refcounting is handled via the cached  
Perl object.  There are no longer two refcounts.

One drawback of this design, though, is that Perl objects are  
heavyweight.  That's ok for big stuff like a PostingList, but it's  
not-so-great for small stuff like a ByteBuf, a Token, or a TermInfo.   
If we were to put a Perl object into every last one of those, I'd be  
concerned both about memory usage and performance.

My current plan is to override the refcounting infrastructure for  
small classes by basing them off of a "FastObj" class which will use  
an integer refcount as Obj does now.  The scheme is more complicated  
to implement than I'd like, and it will have the one-KS-object-many- 
Perl-objects problem for anything that subclasses FastObj.  But it  
will work in the near term and maybe it won't be so bad.

Marvin Humphrey
Rectangular Research

View raw message