incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject [lucy-dev] Per-host abstract elements
Date Tue, 01 Mar 2011 22:08:35 GMT
(moving to lucy-dev...)

On Mon, Feb 28, 2011 at 09:20:46PM -0600, Peter Karman wrote:
> > Interesting approach.  Is there some docs/rationale on which parts and why
> > somewhere?  Sounds worth understanding in more detail.
> 
> Marvin can answer as to whether there are docs on this; my understanding of the
> rationale is that since our goal is idiomatic language implementations on top of
> the underlying C, each host language must do *some* work.

There's this passage from the DevGuide:

  The C core is intentionally left incomplete, however; to be usable, it must
  be bound to a "host" language.  (In this context, even C is considered a
  "host" which must implement the missing pieces and be "bound" to the core.)
  Some of the binding code is autogenerated by Clownfish on a spec customized
  for each language.  Other pieces are hand-coded in either C (using the
  host's C API) or the host language itself.

There's also documentation within the main module page for the Clownfish
compiler at trunk/clownfish/lib/Clownfish.pm.  

Regarding *which* pieces of the core library are to be left unimplemented, a
systematic discussion has never taken place.  Some of the individual modules
have been discussed, e.g. Lucy::Document::Doc, and aspects of the object model
have been discussed on lucy-dev going back to the initial brainstorming Dave
Balmain and I did in 2006 -- but there hasn't been high-level discussion of
the complete whole as to what should be left abstract.

Historically, the codebase that is now Lucy began as a library that was mostly
Perl with some hand-coded XS (KinoSearch 0.1x).  There has been an ongoing
effort to port that codebase to C; for the core library, that task is mostly
done, while other components have reached various stages of completion:

    Clownfish compiler:  c. 50-60% done (under active development)
    Charmonizer:         done
    Test suite for core: c. 50% done

Because the porting effort is incomplete, though, the trunk/perl/ directory
contains more than it should.  It's not necessary to port everything in there
to create a binding to another language, and its contents should not be taken
as the end product of a coherent design effort.

The abstract chunks within the core library have various rationales.  The most
important is that as a community we care a great deal about user-friendly API
design: we started with a nice Perl API and we have been unwilling to
sacrifice its most important facets.  However, some abstract chunks remain
unimplemented just because implementing them is hard, impractical, or unwise.

The biggest unimplented piece is the "fields" member in Lucy::Document::Doc,
which is left to be a native mapping type: Perl hash, Ruby Hash, Python dict,
etc.  The rationales are convenience and to a lesser extent minimizing string
copies.  Having "fields" left abstract necessitates custom code in the
following files:

    perl/xs/Lucy/Document/Doc.c
    perl/xs/Lucy/Index/DocReader.c
    perl/xs/Lucy/Index/Inverter.c

(I recall that the idea of using "overload" to get at the doc object's fields
originated with Father Chrysostomos.)

CaseFolder, Tokenizer and StringHelper are left incomplete because we want to
rely on the host language to supply a regex engine and complex unicode
processing rather than write/bundle the code to do that.

    perl/xs/Lucy/Analysis/CaseFolder.c
    perl/xs/Lucy/Analysis/Tokenizer.c
    perl/xs/Lucy/Util/StringHelper.c

We rely on the host language for exception handling.  This has a big impact on
Lucy::Object::Err, but it also affects some other classes which have to catch
exceptions during normal operation.

    perl/xs/Lucy/Object/Err.c
    perl/xs/Lucy/Index/PolyReader.c
    perl/xs/Lucy/Index/SegReader.c

FSFolder is left incomplete because the "absolutify" function (which
transforms relative paths to absolute paths) hasn't been ported.

    perl/xs/Lucy/Store/FSFolder.c
    
Lucy::Util::Json is left incomplete because we haven't yet replaced our usage
of the CPAN module JSON::XS with either a bundled C library or a hand-rolled
parser based on the Lemon parser generator.

    perl/xs/Lucy/Util/Json.c

Lucy::Object::Obj caches a host object and in Perl at least, uses it for
reference counting.  It's not clear exactly what we'll do in a
garbage-collected language like Ruby, but the design was discussed at
<http://markmail.org/message/jkst23okksyynzss>.

    perl/xs/Lucy/Object/Obj.c

Lucy::Object::VTable contains code which walks the host language's OO
hierarchy, and discovers when the user has supplied a method which should
override a core method.  When a dynamic VTable is created for a user-defined
subclass, a callback is automatically installed which invokes the overriding
subroutine.

    perl/xs/Lucy/Object/VTable.c

Lucy::Object::Host implements the mechanism by which core code calls back into
the host language.
    
    perl/xs/Lucy/Object/Host.c

Lucy::Object::LockFreeRegistry is an oddball class, used only for one purpose:
thread-safe access to VTable singletons.  There are a few lines of esoteric
code needed in its Perl binding due to the fact that it must be accessible
from multiple threads.

    perl/xs/Lucy/Object/LockFreeRegistry.c

The Perl module perl/lib/Lucy.pm now houses pure Perl code which was
previously spread across multiple files.  It contains some of the actual
implementation code which the C files call back to.  For instance,
perl/xs/Lucy/Util/Json.c contains glue code which invokes callbacks to Perl
subroutines defined in Lucy.pm:

... interface definition in core/Lucy/Util/Json.cfh...

    /** Encode <code>dump</code> as JSON.
     */
    inert incremented CharBuf* 
    to_json(Obj *dump);

... glue code in perl/xs/Lucy/Util/Json.c...

    CharBuf*
    Json_to_json(Obj *dump)
    {
        return Host_callback_str(JSON, "to_json", 1,
            ARG_OBJ("dump", dump));
    }

... and implementation code in perl/lib/Lucy.pm:

    sub to_json {
        my ( undef, $dump ) = @_;
        return $json_encoder->encode($dump);
    }

Lastly, there is code which performs conversions between Lucy data structures
and host data structures and which performs parameter validation and argument
handling.

    perl/xs/XSBind.h
    perl/xs/XSBind.c

At some point, the contents of those XSBind modules will likely move
underneath clownfish/.  Other code, e.g. the Json materials, will simply
vanish as we create pure C implementations in core/.

Marvin Humphrey


Mime
View raw message