lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject [lucy-dev] FieldType: no default properties
Date Sat, 18 Sep 2010 18:52:53 GMT

Right now, KinoSearch's FieldType subclasses have certain properties enabled
by default.  

    FullTextType: indexed, stored
    StringType:   indexed, stored
    BlobType:     stored

Having those defaults made the most common use cases for building a Schema
slightly less verbose.  For instance, in the following example, a couple lines
are not needed:

    my $schema = KinoSearch::Plan::Schema->new;
    my $analyzer = KinoSearch::Analysis::PolyAnalyzer->new(language => 'en');
    my $type = KinoSearch::Plan::FullTextType->new(
        indexed       => 1,           # <--------------- not needed
        stored        => 1,           # <--------------- not needed
        highlightable => 1,
        analyzer      => $analyzer,
    $schema->spec_field(name => 'title',   type => $type);
    $schema->spec_field(name => 'content', type => $type);

However, I have come to believe that the advantages of succinctness do not
outweigh the disadvantages of inconsistency, and that it would be better to
have all properties default to false.

If all properties default to false, then it becomes easier to understand at a
glance how a FieldType is configured, both when looking at code and when
examining the schema_NNN.json file.  You don't need to take into account what
the FieldType's class is, nor inspect carefully for missing keys.

Furthermore, by having all properties default to false, we can implement them
as bit-flags and have the C constructors for FieldType subclasses take a
"flags" integer which defaults to 0.

    Analyzer *analyzer = (Analyzer*)Tokenizer_new(NULL);
    uint32_t  flags    = (FType_INDEXED | FType_STORED | FType_HIGHLIGHTABLE);
    TextType *type     = TextType_new(analyzer, flags);
If we change the defaults in Lucy, it will mean a back-compat break with
KinoSearch.  However, we can minimize the disruption by consolidating 
FullTextType and StringType into a single, new TextType class.  Then, when
KinoSearch schema.json files are read and fieldtypes are detected which are
labeled "fulltext" or "string" instead of the new "text", we can just add the
flags and invoke TextType's constructor.

Since numeric types are not public yet in KS, that leaves only BlobType, which
is rarely used.  My thinking is that it probably makes sense to just break
back compat for BlobType.

Marvin Humphrey

View raw message