lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <>
Subject Re: [lucy-dev] FieldType: no default properties
Date Tue, 21 Sep 2010 17:17:20 GMT
On Sat, Sep 18, 2010 at 8:52 PM, Marvin Humphrey <> wrote:
> Greets,
> Right now, KinoSearch's FieldType subclasses have certain properties enabled
> by default.
>    FullTextType: indexed, stored
>    StringType:   indexed, stored
>    BlobType:     stored
> Having those defaults made the most common use cases for building a Schema
> slightly less verbose.  For instance, in the following example, a couple lines
> are not needed:
>    my $schema = KinoSearch::Plan::Schema->new;
>    my $analyzer = KinoSearch::Analysis::PolyAnalyzer->new(language => 'en');
>    my $type = KinoSearch::Plan::FullTextType->new(
>        indexed       => 1,           # <--------------- not needed
>        stored        => 1,           # <--------------- not needed
>        highlightable => 1,
>        analyzer      => $analyzer,
>    );
>    $schema->spec_field(name => 'title',   type => $type);
>    $schema->spec_field(name => 'content', type => $type);
> However, I have come to believe that the advantages of succinctness do not
> outweigh the disadvantages of inconsistency, and that it would be better to
> have all properties default to false.
huge +1 - consistency is crucial IMO
> If all properties default to false, then it becomes easier to understand at a
> glance how a FieldType is configured, both when looking at code and when
> examining the schema_NNN.json file.  You don't need to take into account what
> the FieldType's class is, nor inspect carefully for missing keys.
> Furthermore, by having all properties default to false, we can implement them
> as bit-flags and have the C constructors for FieldType subclasses take a
> "flags" integer which defaults to 0.
I don't know if that is a really good usecase for flags integers
though. For something high level as FieldType I would guess there is
more than just boolean flags - maybe not now but in the future. I
would want to remind you to distinguish between internal
representation and the interface. I don't mind to have an efficient
compact representation but for the interface that seems to be too
specialized already.  I have a whole bunch of ideas for FieldType
since I work on something similar in lucene land and I am happy to
share those ideas. Still need to think how far they apply to lucy.
>    Analyzer *analyzer = (Analyzer*)Tokenizer_new(NULL);
>    uint32_t  flags    = (FType_INDEXED | FType_STORED | FType_HIGHLIGHTABLE);
>    TextType *type     = TextType_new(analyzer, flags);
> If we change the defaults in Lucy, it will mean a back-compat break with
> KinoSearch.  However, we can minimize the disruption by consolidating
> FullTextType and StringType into a single, new TextType class.  Then, when
> KinoSearch schema.json files are read and fieldtypes are detected which are
> labeled "fulltext" or "string" instead of the new "text", we can just add the
> flags and invoke TextType's constructor.

While I see your point I think we should not try to maintain bw compat
to kino search. I had the impression that this is a fresh start please
correct me if I am wrong. If we maintain BW compat (what a pain man!)
then +1
> Since numeric types are not public yet in KS, that leaves only BlobType, which
> is rarely used.  My thinking is that it probably makes sense to just break
> back compat for BlobType.

Are we already that far to talk about something like Field Type?

> Marvin Humphrey

View raw message