incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: Segment
Date Tue, 24 Mar 2009 23:33:28 GMT
On Tue, Mar 24, 2009 at 03:54:32PM -0400, Michael McCandless wrote:
> Marvin Humphrey <> wrote:

> >    public incremented Hash*
> >    Metadata(DataWriter *self);

> What does "incremented" mean?

It means that the caller has to take responsibility for one refcount.  Usually
you'll see that on constructors and factory methods.

Having "incremented" as part of the method/function signature makes it easier
to autogenerate binding code that doesn't make refcounting errors and leak

> Looks good, though, I might add a way for a given module to register
> the versions it reads & writes (presumably it only writes the most
> recent one); then min/max can be derived based on what was registered.

I thought about something like that.  It's more awkward, though, and I'm not
sure how much it buys us.  I think the common case would be to drop support
for versions below a certain minimum and to support anything later.  In the
event that your DataReader really does support a discontiguous set of
versions, you can just do extra error checking yourself.

Even though DataReader is an advanced class, we should still value simplicity
and try to make it as easy to use as possible.

>  This can be useful for introspection too, so instead of just seeing
> "format 2" something could decode that to the string describing what
> format 2 was (eg "added omitTermFreqAndPositions capability").

So, the advantage would be that we could throw more meaningful error messages?  

The thing is, I'm not sure how useful it is to tell the user what kind of
change occurred at "format 2".  How would that help them to recover? 

There's also Luke-style index browsing.  But there's only so much screen
space, and I can't see how that info has utility compared to other things that
Luke can show you.

It seems to me that that kind of thing belongs in the plugin class
documentation.  Am I missing another important runtime application?

> > It might make sense to throw specific exception classes in Lucy.  I haven't
> > worked something like that out in KS for three reasons.  First, it's hard to
> > catch exceptions from C without leaking memory.  Second Perl's try-catch
> > mechanism isn't very elegant.  Third, faking up a try-catch-finally interface
> > in C that would be abstract enough to handle all potential host
> > exception-handling mechanisms is, uh, challenging.
> This sounds very difficult!

We can throw exceptions that belong to meaningful classes without too much
difficulty.  We just can't set up try-catch-finally.

But that's not a big deal.  We can just set most things up to check return
values, and throw fatal errors when necessary.

> > However, we could create full-fledged exception objects for Lucy, so that THROW
> > calls might look something like this:
> >
> >    THROW(Err_data_component_version, /* <--- An integer error id */
> >        "Format version '%i32' is less than the minimum "
> >        "supported version '%i32' for %o", format, min,
> >        DataReader_Get_Class_Name(self));
> >
> > The exception objects generated by THROW calls do not have to subclass
> > Lucy::Obj, because we will always be returning control to the host.  So, they
> > could be, for example, plain old Java Exception subclasses.
> What would THROW try to do, and, how?

The Lucy core code would format an error message and choose an error number
from a list of Lucy error codes.  A stack trace would be great, too, though
that's hard to do portably.

Then it would call a method which would have to be implemented per-Host.

For Java, the implementation might contain something  like this:

  if (errorNumber == lucy_Err_data_component_version) {
    throw new DataComponentVersionException(message);
  else if (...) {

I should also mention that THROW would be a macro, as implied by the all-caps.
It would call the function lucy_Err_throw_at, automatically inserting line and
function name information when possible:

    lucy_Err_throw_at(const char *file, int line, const char *func,
                      const char *pattern, ...);
      #define LUCY_THROW(...) \
        lucy_Err_throw_at(__FILE__, __LINE__, LUCY_ERR_FUNC_MACRO, \

Some compilers don't support variadic macros, though (cough cough MSVC cough),
so we have to omit the context data and define THROW as a variadic function.

    LUCY_THROW(const char *pattern, ...);
How about "Lucy::Util::Err" for the exception handling code?  I've been trying
to avoid things like "String", "Array", "Exception" and such so that we don't
conflict with core host symbols -- hence the funny names like "CharBuf" and

Marvin Humphrey

View raw message