incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: Segment
Date Tue, 24 Mar 2009 16:52:37 GMT
On Tue, Mar 24, 2009 at 08:11:09AM -0400, Michael McCandless wrote:

> Shouldn't segmeta itself have a format too?

Yes -- it's in there, just under the "segmeta" key rather than at the root

      "segmeta" : { 
         "doc_count" : "11054",
         "field_names" : [ 
         "format" : "1"    <--------------------
         "name" : "seg_3"

> Are you going to provide utility APIs that components can use to deal
> with the format number?  

A good plan.  DataWriter already has two relevant methods.

    /** Create a Hash of arbitrary metadata to be serialized and stored 
     * by the Segment.  The default implementation supplies a Hash with 
     * a single key-value pair for "format".
    public incremented Hash*
    Metadata(DataWriter *self);

    /** Every writer must specify a file format revision number, which should
     * increment each time the format changes. Responsibility for revision
     * checking is left to the companion DataReader.
    public abstract i32_t
    Format(DataWriter *self);

> eg so a component can register the N formats it's able to deal with,
> so a consistent error is thrown if a format is too old or too new,
> etc.

Haven't got standardized methods to perform format checking in DataReader yet. 
How do these look?

    /** Throw an error unless the supplied format version is at least
     * <code>min</code> and no more than <code>max</code>.
     * @param format Format version.
     * @param min Minimum supported format version, which must be at least 1.
     * @param max Maximum supported format version, which must be at least 1.
     * @return the version.
    public i32_t
    Validate_Format(DataReader *self, i32_t format, i32_t min, i32_t max);

    /** Attempt to extract a "format" value from the supplied metadata Hash.
     * If the extraction is a success, calls Validate_Format().
     * @return either the return value of Validate_Format() or 0 (an invalid
     * format value).
     * /
    Check_Format(DataReader *self, Hash *metadata = NULL,
                 i32_t min, i32_t max);

Note that Validate_Format() is public, but that Check_Format(), which would be
used by core components, is not.

Implementation code (unverified): 

    DataWriter_validate_format(DataReader *self, i32_t format, 
                               i32_t min, i32_t max)
        if (format < min) {
            THROW("Format version '%i32' is less than the minimum "
                "supported version '%i32' for %o", format, min,
        else if (format > max) {
            THROW("Format version '%i32' is greater than the maximum 
                "supported version '%i32' for %o", format, max,
        return format;

    DataWriter_check_format(DataReader *self, Hash *metadata,
                            i32_t min, i32_t max)
        i32_t version = 0;
        if (metadata) {
            Obj *format = Hash_Fetch_Str(metadata, "format", 6);
            if (format) {
                version = DataWriter_Check_Format(self, Obj_To_I64(format), 
                    min, max);
        return version;

It might make sense to throw specific exception classes in Lucy.  I haven't
worked something like that out in KS for three reasons.  First, it's hard to
catch exceptions from C without leaking memory.  Second Perl's try-catch
mechanism isn't very elegant.  Third, faking up a try-catch-finally interface
in C that would be abstract enough to handle all potential host
exception-handling mechanisms is, uh, challenging.

The only caught exceptions in the KS core happen in IndexReader's open()
command, due to the lockless opening code and for reasons you are no doubt
familiar with. ;)  All other errors are fatal.

However, we could create full-fledged exception objects for Lucy, so that THROW
calls might look something like this:

    THROW(Err_data_component_version, /* <--- An integer error id */
        "Format version '%i32' is less than the minimum "
        "supported version '%i32' for %o", format, min,

The exception objects generated by THROW calls do not have to subclass
Lucy::Obj, because we will always be returning control to the host.  So, they
could be, for example, plain old Java Exception subclasses.

Marvin Humphrey

View raw message