incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Segment
Date Sun, 29 Mar 2009 13:40:21 GMT
On Sat, Mar 28, 2009 at 10:54 PM, Marvin Humphrey
<> wrote:
> On Wed, Mar 25, 2009 at 07:34:01AM -0400, Michael McCandless wrote:
>> >> What does "incremented" mean?
>> >
>> > It means that the caller has to take responsibility for one refcount.  Usually
>> > you'll see that on constructors and factory methods.
>> >
>> > Having "incremented" as part of the method/function signature makes it easier
>> > to autogenerate binding code that doesn't make refcounting errors and leak
>> > memory.
>> OK got it.  It's like when Python's docs say "returns a reference".
>> It's great to make this a "formal" part of the API.
> I'm pretty sure you grok this already but for clarity's sake: this is
> Boilerplater syntax -- so it's a "formal" part of an *internal* API.

Yeah got it.

> Even though Boilerplater is a very small language, I was deeply reluctant to
> write it.  Naturally I hate all programming languages and I have fantasies of
> replacing C with something "better" :) -- but I recognize the challenges that
> language authors face and have no desire to expose Boilerplater outside of
> Lucy.  It's just a means to an end.

Yes all languages have their problems.  Our species hasn't quite
figured out the best way to program these computers just yet...

> The C API docs -- which I expect we'll autogenerate from the .bp source files
> just as I'm currently generating Perl POD docs from .bp files -- will probably
> be HTML files and will say "returns a new reference" or "returns a borrowed
> reference" just like the Python docs.

Sounds good.

>> Instead of having a bunch of version constants at the top of a class
>> (eg, we'd invoke the "Versions.add(...)"  to create
>> each version.
> Where would we keep track of the registrations?  Will each DataReader subclass
> keep a class Hash variable?
>  static Hash* versions = NULL;
>  static void
>  S_init_versions_hash()
>  {
>      versions = Hash_new(2);
>      Hash_Store_Str(versions, "1", 1, CB_newf("initial format"));
>      Hash_Store_Str(versions, "2", 1, CB_newf("fixed stoopid mistake"));
>  }
>  Hash*
>  LexWriter_versions(LexWriter *self)
>  {
>      UNUSED_VAR(self);
>      if (!versions) { S_init_versions_hash(); }
>      return versions;
>  }
> Actually that'll leak memory without an atexit() or something like that, but
> you get the idea.

What does UNUSED_VAR(self) do?

Yes, I think the registrations'd be stored only in memory, but I
wasn't picturing you'd interact w/ a hash directly; I thought a
Versions class that holds the hash, and you statically instantiate
Versions and call "add" to store your versions.  Then you consult that
instance to get latest() (used when writing), to check a version
number, for transparency when writing comments into the JSON, etc.

>> Introspection/transparency is the primary reason I can think of --
>> it's the same motivation that led you to JSON over private binary.
>> Ie, it'd be great to see a string description of what "format: '2'"
>> means; eg if each int has a known corresponding description, you could
>> add a comment on that line the JSON.
>> And, in the source code, we of course assign symbolic names to these
>> constants anyway.
>> Also, having an explicit method call to "add" a new version avoids
>> silly risks that when adding a new version someone messes up adding
>> one to the int :) Or, messes up keeping track of the latest format
>> (the format that's written).  It may help with the back compat unit
>> tests, too, ensuring that each supported version is tested.
>> I guess it's a matter of where do you draw the line b/w browseability
>> of your JSON metadata vs "you must pull in an external tool to get
>> more details".
> OK, I'm cool with this so long as we can come up with a sensible API.

Yeah I haven't fleshed out a full API just yet...

> There are no performance implications or significant shared-object-bloat issues.
>> You are needing to bring online a scary amount of basic
>> infrastructure (GC, exception handling, object vtables, etc.) just to
>> get the ball rolling.
> True to an extent, but there's a huge payoff: the actual search code -- where
> the rubber hits the road -- is only marginally harder to follow than Java.

I agree.  This is simply the ante for the game you want to play, here.


View raw message