lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: [lucy-dev] Non-deterministic destruction in Perl 5.15
Date Fri, 11 Nov 2011 22:51:26 GMT

On Nov 8, 2011, at 5:32 PM, Marvin Humphrey wrote:

> Greets,
> In Perl 5.15 (current "blead" Perl -- the developer release), Lucy fails most
> of its tests because of an exception thrown during global destruction:
>   (in cleanup) Insane attempt to destroy VTable for class 'Lucy::Object::Obj'
>   lucy_VTable_destroy at /home/sts/cpansmoke/perl-5.15.2/cpan/build/Lucy-0.2.2-o_YHcb/core/Lucy/Object/VTable.c
line 44
>   at t/018-host.t line 0
> That's a tripwire that I set because VTable's destructor should *never* be
> invoked.  We leak VTables on purpose.  
> What has changed in Perl 5.15 is that destructors are now called during global
> destruction; previously, Perl freed all SVs during global destruction but did
> not call DESTROY on objects.

Perl previously did call DESTROY on objects during global destruction, and the order was non-determistic,
but a few objects would escape the purge, in particular:

• blessed array elements (bless \$_[0])
• blessed closure variables (bless my \$x; sub foo { $x ... })
• any other unreferenced SVs (not referenced by RVs or GVs)

The VTables belong to the third category.

> This change to Perl is going to require a corresponding change
> to Lucy's Perl bindings.  Consider the following code:
>   my %hash = (
>       searcher => Lucy::Search::IndexSearcher->new(index => $path),
>   );
>   $hash{circular_reference} = \%hash;
> Because of the circular reference, that Perl hash, the Searcher it refers to,
> and crucially, the Searcher's inner PolyReader will not be deallocated until
> global destruction.  During global destruction, though, refcounting goes out
> the window and destruction order is effectively random.

How has Lucy worked before, seeing that the order was already non-deterministic? Do they simply
depend on the presence of the VTable?

> What we would ordinarily want to see is destruction moving from the outermost
> object to the innermost:
>   Perl hash
>   IndexSearcher
>   PolyReader
>   SegReaders
>   DataReaders
>   InStreams
>   FileHandles
>   ...
> This is important because when we get to the IndexSearcher's destructor, its
> subcomponents still need to be valid:
>   void
>   IxSearcher_destroy(IndexSearcher *self) {
>       DECREF(self->reader);

This seems to answer my question in the negative.

From reading this code superficially, it looks as though the Searcher object has an internal
(non-Perl) reference count on the reader. The Perl object will also have a reference count
on the reader. That should prevent the reader from being destroyed before the searcher is.

>       // ...
>   }
> If self->reader has already been freed when this destructor gets called,
> that's bad news -- we're going to be invoking DECREF on freed memory.  
> As far as I can tell, the only solution is to disconnect our DESTROY methods
> when Perl enters global destruction and leak everything.  Here's sample XS
> code to get the point across:
>   void
>   DESTROY(self)
>       lucy_IndexSearcher *self;
>       if (PL_phase != PERL_PHASE_DESTRUCT) {
>           lucy_IxSearcher_destroy(self);
>       }
> Of course, this defeats the purpose of the change that was made in Perl 5.15.
> The rationale for the new behavior is to support situations where for example,
> you could guarantee that when a Perl interpreter in an embedded system shuts
> down, *everything* gets reclaimed.  But I believe that architecture is only
> feasible when you control all memory allocation (as when the OS closes a
> process) and thus Perl's new global destruction model is flawed as it cannot
> encompass external resources.

Perl’s global destruction has always necessarily been flawed. It cannot but be non-deterministic,
due to the way circular references work. There is simply no way to know which thing is the
‘outer’ object, and which is the ‘inner’, as they are all just linked, rather than
‘inner’ or ‘outer’.

I can’t say I fully understand why destroying the Perl-level Reader before the Searcher
would be a problem. But you do seem to be implying that VTables need to be present for anything
to work. If that is the case, then Lucy was already relying on an implementation detail, so
why not continue to?

Let’s look at the relevant code from the perl source:

> void
> Perl_sv_clean_objs(pTHX)
> {
>    dVAR;
>    GV *olddef, *olderr;
>    PL_in_clean_objs = TRUE;

This line goes through all scalars that are references to objects and calls undef() on them:

>    visit(do_clean_objs, SVf_ROK, SVf_ROK);

The next two function calls eliminate all blessed GV slots. I think the GV slots are nulled
and the SVs in them have their reference count lowered, but I haven’t actually read the

>    /* Some barnacles may yet remain, clinging to typeglobs.
>     * Run the non-IO destructors first: they may want to output
>     * error messages, close files etc */
>    visit(do_clean_named_objs, SVt_PVGV|SVpgv_GP, SVTYPEMASK|SVp_POK|SVpgv_GP);
>    visit(do_clean_named_io_objs, SVt_PVGV|SVpgv_GP, SVTYPEMASK|SVp_POK|SVpgv_GP);

This is the bit added in 5.15. It looks for any objects remaining. Since they may be referenced
by other objects (indirectly, through closures or array elements), whose destructors have
not fired yet, they are not actually freed, but simply cursed; that is, they revert to non-object
status (something you cannot do from Perl or XS, even though the core has the facility to
do it).

>    /* And if there are some very tenacious barnacles clinging to arrays,
>       closures, or what have you.... */
>    visit(do_curse, SVs_OBJECT, SVs_OBJECT);

>    olddef = PL_defoutgv;
>    PL_defoutgv = NULL; /* disable skip of PL_defoutgv */
>    if (olddef && isGV_with_GP(olddef))
> 	do_clean_named_io_objs(aTHX_ MUTABLE_SV(olddef));
>    olderr = PL_stderrgv;
>    PL_stderrgv = NULL; /* disable skip of PL_stderrgv */
>    if (olderr && isGV_with_GP(olderr))
> 	do_clean_named_io_objs(aTHX_ MUTABLE_SV(olderr));
>    SvREFCNT_dec(olddef);
>    PL_in_clean_objs = FALSE;
> }

So based on that it looks as though you simply need to remove the destructor on VTables, since
they will be destroyed last. Or create a destructor that makes sure all other Lucy objects
have been purged.

Now I hope I have you thoroughly confused. :-)

View raw message