Return-Path: Delivered-To: apmail-incubator-lucy-dev-archive@www.apache.org Received: (qmail 73528 invoked from network); 18 Sep 2010 18:53:22 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 18 Sep 2010 18:53:22 -0000 Received: (qmail 41691 invoked by uid 500); 18 Sep 2010 18:53:22 -0000 Delivered-To: apmail-incubator-lucy-dev-archive@incubator.apache.org Received: (qmail 41613 invoked by uid 500); 18 Sep 2010 18:53:21 -0000 Mailing-List: contact lucy-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: lucy-dev@incubator.apache.org Delivered-To: mailing list lucy-dev@incubator.apache.org Received: (qmail 41605 invoked by uid 99); 18 Sep 2010 18:53:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 18 Sep 2010 18:53:21 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [68.116.39.62] (HELO rectangular.com) (68.116.39.62) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 18 Sep 2010 18:53:15 +0000 Received: from marvin by rectangular.com with local (Exim 4.63) (envelope-from ) id 1Ox2Wr-0002wl-Qq for lucy-dev@incubator.apache.org; Sat, 18 Sep 2010 11:52:53 -0700 Date: Sat, 18 Sep 2010 11:52:53 -0700 To: lucy-dev@incubator.apache.org Message-ID: <20100918185253.GA10999@rectangular.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.13 (2006-08-11) From: Marvin Humphrey Subject: [lucy-dev] FieldType: no default properties Greets, Right now, KinoSearch's FieldType subclasses have certain properties enabled by default. FullTextType: indexed, stored StringType: indexed, stored BlobType: stored Having those defaults made the most common use cases for building a Schema slightly less verbose. For instance, in the following example, a couple lines are not needed: my $schema = KinoSearch::Plan::Schema->new; my $analyzer = KinoSearch::Analysis::PolyAnalyzer->new(language => 'en'); my $type = KinoSearch::Plan::FullTextType->new( indexed => 1, # <--------------- not needed stored => 1, # <--------------- not needed highlightable => 1, analyzer => $analyzer, ); $schema->spec_field(name => 'title', type => $type); $schema->spec_field(name => 'content', type => $type); However, I have come to believe that the advantages of succinctness do not outweigh the disadvantages of inconsistency, and that it would be better to have all properties default to false. If all properties default to false, then it becomes easier to understand at a glance how a FieldType is configured, both when looking at code and when examining the schema_NNN.json file. You don't need to take into account what the FieldType's class is, nor inspect carefully for missing keys. Furthermore, by having all properties default to false, we can implement them as bit-flags and have the C constructors for FieldType subclasses take a "flags" integer which defaults to 0. Analyzer *analyzer = (Analyzer*)Tokenizer_new(NULL); uint32_t flags = (FType_INDEXED | FType_STORED | FType_HIGHLIGHTABLE); TextType *type = TextType_new(analyzer, flags); If we change the defaults in Lucy, it will mean a back-compat break with KinoSearch. However, we can minimize the disruption by consolidating FullTextType and StringType into a single, new TextType class. Then, when KinoSearch schema.json files are read and fieldtypes are detected which are labeled "fulltext" or "string" instead of the new "text", we can just add the flags and invoke TextType's constructor. Since numeric types are not public yet in KS, that leaves only BlobType, which is rarely used. My thinking is that it probably makes sense to just break back compat for BlobType. Marvin Humphrey