lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Baby steps towards making Lucene's scoring more flexible...
Date Sat, 13 Mar 2010 10:40:36 GMT
On Thu, Mar 11, 2010 at 12:35 PM, Marvin Humphrey
<marvin@rectangular.com> wrote:
> On Mon, Mar 08, 2010 at 02:10:35PM -0500, Michael McCandless wrote:
>
>> We ask it to give us a Codec.
>
> There's a conflict between the segment-wide role of the "Codec" class and its
> role as specifier for posting format.
>
> In some sense, you could argue that the "codec" reads/writes the entire index
> segment -- which includes not only postings files, but also stored fields,
> term vectors, etc.  However, the compression algorithms after which these
> codecs are named have nothing to do with those other files.  PFORCodec isn't
> relevant to stored fields.
>
> I'd argue for limiting the role of "Codec" to encoding and decoding posting
> files.

Yeah perhaps we should rename Codec -> PostingsCodec.  And with time
add different interfaces for the other components of a segment (eg
StoredFieldsCodec).

> As far as modularizing other aspects of index reading and writing, I don't
> think a simple factory is the way to go.  I favor using a composite design
> pattern for SegWriter and SegReader (rather than subclassing), and an
> initialization phase controlled by an Architecture object.
>
> It was Earwin Burrfoot who persuaded me of the merits of a user-defined
> initialization phase over a user-defined factory method:
> <http://markmail.org/message/ukhcvp2ydfxpcg7q>.

How would this work specifically for postings reading & writing?

When a segment is opened (eg via IndexReader.open/reopen,
IndexWriter.getReader), we need to fully init all components before
returning control.

>> So far my fav is still CodecProvider ;)
>
> It seems that the primary reason this object is needed is that IndexReader
> needs to be able to find the right decoder when it encounters an unfamiliar
> codec name.  Since the core doesn't know about user-created codecs, it's
> necessary for the user to register the name => codec pairing in advance so
> that core can find it.
>
> If that's this object's main role, I'd suggest "CodecRegistry".

Well, it also provides a writer for newly created segments...

>> Naming is the hardest part!!
>
> For me, the hardest parts of API design are...
>
>  A) Designing public abstract classes / interfaces.
>  B) Compensating for the curse of knowledge.

Yes both of these are hard.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message