Return-Path: Delivered-To: apmail-incubator-lucy-dev-archive@www.apache.org Received: (qmail 8764 invoked from network); 21 Sep 2010 17:18:03 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 21 Sep 2010 17:18:03 -0000 Received: (qmail 75797 invoked by uid 500); 21 Sep 2010 17:18:03 -0000 Delivered-To: apmail-incubator-lucy-dev-archive@incubator.apache.org Received: (qmail 75741 invoked by uid 500); 21 Sep 2010 17:18:03 -0000 Mailing-List: contact lucy-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: lucy-dev@incubator.apache.org Delivered-To: mailing list lucy-dev@incubator.apache.org Received: (qmail 75733 invoked by uid 99); 21 Sep 2010 17:18:03 -0000 Received: from Unknown (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Sep 2010 17:18:03 +0000 X-ASF-Spam-Status: No, hits=2.8 required=10.0 tests=FREEMAIL_FROM,FREEMAIL_REPLYTO,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of simon.willnauer@googlemail.com designates 209.85.216.175 as permitted sender) Received: from [209.85.216.175] (HELO mail-qy0-f175.google.com) (209.85.216.175) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Sep 2010 17:17:41 +0000 Received: by qyk31 with SMTP id 31so4469902qyk.6 for ; Tue, 21 Sep 2010 10:17:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:received:received:reply-to :in-reply-to:references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=KMHTLzgbGo9Sg9PprD444rFlJsws/pBHOi4sDsyi7l8=; b=bcUYrHTgowYX3n4p6Le7QAO+HV+IXrdoqKdEsffvNpWKhGII618AlQ73CoaEduhhVd bsUCt2rJUG68YfVjyWxB6eAbibhvOGULBUnNcyDiaujC/X+/Y9QgO8eHjKDAASsm7HbP deyMcsfAmMHE096RJGVSlO6YB6NbP7CSs8r44= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:content-type:content-transfer-encoding; b=wxEL2zRHhcNDjKC1sRD7HtxNkztvNMt+KE8aInxRY8gs/H1/nkzf8UiNUyH0vpxSLR WKEL6psCBTCSAf5vNJ6GHxUqmxKnsqzfawQ9hDLW35Z7ZhkUmGQCg3W4JhTK7EI5i7o/ HaqSOSJmRQqWNI9Ih92BFZNG8E2Jne0snIAYs= MIME-Version: 1.0 Received: by 10.229.35.5 with SMTP id n5mr6890516qcd.175.1285089440784; Tue, 21 Sep 2010 10:17:20 -0700 (PDT) Received: by 10.229.245.138 with HTTP; Tue, 21 Sep 2010 10:17:20 -0700 (PDT) Reply-To: simon.willnauer@gmail.com In-Reply-To: <20100918185253.GA10999@rectangular.com> References: <20100918185253.GA10999@rectangular.com> Date: Tue, 21 Sep 2010 19:17:20 +0200 Message-ID: From: Simon Willnauer To: lucy-dev@incubator.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Subject: Re: [lucy-dev] FieldType: no default properties On Sat, Sep 18, 2010 at 8:52 PM, Marvin Humphrey w= rote: > Greets, > > Right now, KinoSearch's FieldType subclasses have certain properties enab= led > by default. > > =C2=A0 =C2=A0FullTextType: indexed, stored > =C2=A0 =C2=A0StringType: =C2=A0 indexed, stored > =C2=A0 =C2=A0BlobType: =C2=A0 =C2=A0 stored > > Having those defaults made the most common use cases for building a Schem= a > slightly less verbose. =C2=A0For instance, in the following example, a co= uple lines > are not needed: > > =C2=A0 =C2=A0my $schema =3D KinoSearch::Plan::Schema->new; > =C2=A0 =C2=A0my $analyzer =3D KinoSearch::Analysis::PolyAnalyzer->new(lan= guage =3D> 'en'); > =C2=A0 =C2=A0my $type =3D KinoSearch::Plan::FullTextType->new( > =C2=A0 =C2=A0 =C2=A0 =C2=A0indexed =C2=A0 =C2=A0 =C2=A0 =3D> 1, =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 # <--------------- not needed > =C2=A0 =C2=A0 =C2=A0 =C2=A0stored =C2=A0 =C2=A0 =C2=A0 =C2=A0=3D> 1, =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 # <--------------- not needed > =C2=A0 =C2=A0 =C2=A0 =C2=A0highlightable =3D> 1, > =C2=A0 =C2=A0 =C2=A0 =C2=A0analyzer =C2=A0 =C2=A0 =C2=A0=3D> $analyzer, > =C2=A0 =C2=A0); > =C2=A0 =C2=A0$schema->spec_field(name =3D> 'title', =C2=A0 type =3D> $typ= e); > =C2=A0 =C2=A0$schema->spec_field(name =3D> 'content', type =3D> $type); > > However, I have come to believe that the advantages of succinctness do no= t > outweigh the disadvantages of inconsistency, and that it would be better = to > have all properties default to false. huge +1 - consistency is crucial IMO > > If all properties default to false, then it becomes easier to understand = at a > glance how a FieldType is configured, both when looking at code and when > examining the schema_NNN.json file. =C2=A0You don't need to take into acc= ount what > the FieldType's class is, nor inspect carefully for missing keys. > > Furthermore, by having all properties default to false, we can implement = them > as bit-flags and have the C constructors for FieldType subclasses take a > "flags" integer which defaults to 0. I don't know if that is a really good usecase for flags integers though. For something high level as FieldType I would guess there is more than just boolean flags - maybe not now but in the future. I would want to remind you to distinguish between internal representation and the interface. I don't mind to have an efficient compact representation but for the interface that seems to be too specialized already. I have a whole bunch of ideas for FieldType since I work on something similar in lucene land and I am happy to share those ideas. Still need to think how far they apply to lucy. > > =C2=A0 =C2=A0Analyzer *analyzer =3D (Analyzer*)Tokenizer_new(NULL); > =C2=A0 =C2=A0uint32_t =C2=A0flags =C2=A0 =C2=A0=3D (FType_INDEXED | FType= _STORED | FType_HIGHLIGHTABLE); > =C2=A0 =C2=A0TextType *type =C2=A0 =C2=A0 =3D TextType_new(analyzer, flag= s); > > If we change the defaults in Lucy, it will mean a back-compat break with > KinoSearch. =C2=A0However, we can minimize the disruption by consolidatin= g > FullTextType and StringType into a single, new TextType class. =C2=A0Then= , when > KinoSearch schema.json files are read and fieldtypes are detected which a= re > labeled "fulltext" or "string" instead of the new "text", we can just add= the > flags and invoke TextType's constructor. While I see your point I think we should not try to maintain bw compat to kino search. I had the impression that this is a fresh start please correct me if I am wrong. If we maintain BW compat (what a pain man!) then +1 > > Since numeric types are not public yet in KS, that leaves only BlobType, = which > is rarely used. =C2=A0My thinking is that it probably makes sense to just= break > back compat for BlobType. +1 Are we already that far to talk about something like Field Type? simon > > Marvin Humphrey > >