From java-dev-return-47651-apmail-lucene-java-dev-archive=lucene.apache.org@lucene.apache.org Sat Mar 13 10:41:43 2010 Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 73810 invoked from network); 13 Mar 2010 10:41:42 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 13 Mar 2010 10:41:42 -0000 Received: (qmail 38750 invoked by uid 500); 13 Mar 2010 10:41:02 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 38610 invoked by uid 500); 13 Mar 2010 10:41:02 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 38602 invoked by uid 99); 13 Mar 2010 10:41:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 13 Mar 2010 10:41:02 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.160.176] (HELO mail-gy0-f176.google.com) (209.85.160.176) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 13 Mar 2010 10:40:57 +0000 Received: by gyd8 with SMTP id 8so533024gyd.35 for ; Sat, 13 Mar 2010 02:40:36 -0800 (PST) MIME-Version: 1.0 Received: by 10.101.175.31 with SMTP id c31mr724808anp.99.1268476836572; Sat, 13 Mar 2010 02:40:36 -0800 (PST) In-Reply-To: <20100311173522.GA8567@rectangular.com> References: <9ac0c6aa1003020255h5eb6be80o2dc53dfba0105c05@mail.gmail.com> <9ac0c6aa1003040923n3018c06bqf401d3f700fd5dbc@mail.gmail.com> <20100305185417.GA5510@rectangular.com> <9ac0c6aa1003060207v58918446r4a920ad107d7eda1@mail.gmail.com> <20100307182151.GA14347@rectangular.com> <9ac0c6aa1003081013q11a8724etcd424b9b4599695@mail.gmail.com> <2D127F11DC79714E9B6A43AC9458147F366625DA@suex07-mbx-03.ad.syr.edu> <2D127F11DC79714E9B6A43AC9458147F366625DE@suex07-mbx-03.ad.syr.edu> <9ac0c6aa1003081110y2054fea2ic78f5294ec7d99ef@mail.gmail.com> <20100311173522.GA8567@rectangular.com> Date: Sat, 13 Mar 2010 05:40:36 -0500 Message-ID: <9ac0c6aa1003130240s2ba37d8aw5d01a70889a1efbb@mail.gmail.com> Subject: Re: Baby steps towards making Lucene's scoring more flexible... From: Michael McCandless To: java-dev@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 On Thu, Mar 11, 2010 at 12:35 PM, Marvin Humphrey wrote: > On Mon, Mar 08, 2010 at 02:10:35PM -0500, Michael McCandless wrote: > >> We ask it to give us a Codec. > > There's a conflict between the segment-wide role of the "Codec" class and its > role as specifier for posting format. > > In some sense, you could argue that the "codec" reads/writes the entire index > segment -- which includes not only postings files, but also stored fields, > term vectors, etc. However, the compression algorithms after which these > codecs are named have nothing to do with those other files. PFORCodec isn't > relevant to stored fields. > > I'd argue for limiting the role of "Codec" to encoding and decoding posting > files. Yeah perhaps we should rename Codec -> PostingsCodec. And with time add different interfaces for the other components of a segment (eg StoredFieldsCodec). > As far as modularizing other aspects of index reading and writing, I don't > think a simple factory is the way to go. I favor using a composite design > pattern for SegWriter and SegReader (rather than subclassing), and an > initialization phase controlled by an Architecture object. > > It was Earwin Burrfoot who persuaded me of the merits of a user-defined > initialization phase over a user-defined factory method: > . How would this work specifically for postings reading & writing? When a segment is opened (eg via IndexReader.open/reopen, IndexWriter.getReader), we need to fully init all components before returning control. >> So far my fav is still CodecProvider ;) > > It seems that the primary reason this object is needed is that IndexReader > needs to be able to find the right decoder when it encounters an unfamiliar > codec name. Since the core doesn't know about user-created codecs, it's > necessary for the user to register the name => codec pairing in advance so > that core can find it. > > If that's this object's main role, I'd suggest "CodecRegistry". Well, it also provides a writer for newly created segments... >> Naming is the hardest part!! > > For me, the hardest parts of API design are... > > A) Designing public abstract classes / interfaces. > B) Compensating for the curse of knowledge. Yes both of these are hard. Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org