incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Passing strings from Host to Lucy
Date Tue, 01 Sep 2009 15:58:11 GMT

Many times in Lucy, we will be passing short, constant strings across the
host-to-C barrier -- file paths, field names, and so on.  However, Lucy won't
know how to deal with the host's string type: Perl scalars, Java Strings, etc.

C's native string type is the NULL-terminated array of char, but for reasons
already laid out in past posts -- security being the most important -- Lucy
needs a Unicode string type that keeps track of its own length and allocation
size and maintains an internally consistent state.  That's CharBuf.  So, we
have to translate from Host string type to CharBuf.

The brute force approach would be to allocate a new CharBuf at the Perl/Lucy
boundary every time, copy the content of the Host string type into it, then
DECREF() the CharBuf after the Lucy call returns.  However, this is
inefficient because allocating and freeing memory is expensive.

Instead, what we can do is create a Lucy data structure using stack memory,
copy the Host string-type's string pointer and length into it, then pass its
address to the relevant Lucy function.

Handling the assignments manually would look something like this:

  field_number_from_perl_args(SV *segment_sv, SV *field_sv)
    Segment *segment = (Segment*)XSBind_sv_to_obj(segment_sv, SEGMENT);
    ZombieCharBuf field;
    field.ref.count    = 1;
    field.vtable       = ZOMBIECHARBUF;
    field.cap          = 0;
    field.ptr          = SvPVutf8_nolen(field_sv);
    field.size         = SvCUR(field_sv);
    return Seg_Field_Num(segment, (CharBuf*)&field);

In reality, we'll perform those assignments within a function call:
  field_number_from_perl_args(SV *segment_sv, SV *field_sv)
    Segment *segment = (Segment*)XSBind_sv_to_lucy_obj(segment_sv, SEGMENT);
    ZombieCharBuf field
         = ZCB_make_str(SvPVutf8_nolen(field_sv), SvCUR(field_sv));
    return Seg_Field_Num(segment, (CharBuf*)&field);

Using stack memory rather than continuously creating and destroying objects is
much more efficient, and resolves the speed problem.

Right now, this data structure bears the whimsical name "ZombieCharBuf", as in
"A CharBuf which cannot be Destroyed."  ZombieCharBufs are either created on
the stack or as compile-time static or global variables; they are never
malloc'd.  Calling Destroy() on them is illegal and triggers an exception,
hence the name.  However, "ZombieCharBuf" is not meant to be a final name;
consider it an intentionally irritating reminder that there are issues with
the CharBuf hierarchy regarding readonly strings and compatibility with
multiple Unicode encodings that remain to be resolved.  :)  

The one thing that bothers me about this scheme is that it forces us to publish
the struct definition of ZombieCharBuf.  Ordinarily, class struct definitions
will be opaque -- but the compiler needs to know at least sizeof(ZombieCharBuf)
in order to allocate the proper amount of stack memory.

Marvin Humphrey

View raw message