lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: Baby steps towards making Lucene's scoring more flexible...
Date Thu, 11 Mar 2010 17:35:22 GMT
On Mon, Mar 08, 2010 at 02:10:35PM -0500, Michael McCandless wrote:

> We ask it to give us a Codec.

There's a conflict between the segment-wide role of the "Codec" class and its
role as specifier for posting format.

In some sense, you could argue that the "codec" reads/writes the entire index
segment -- which includes not only postings files, but also stored fields,
term vectors, etc.  However, the compression algorithms after which these
codecs are named have nothing to do with those other files.  PFORCodec isn't
relevant to stored fields.

I'd argue for limiting the role of "Codec" to encoding and decoding posting

As far as modularizing other aspects of index reading and writing, I don't
think a simple factory is the way to go.  I favor using a composite design
pattern for SegWriter and SegReader (rather than subclassing), and an
initialization phase controlled by an Architecture object.  

It was Earwin Burrfoot who persuaded me of the merits of a user-defined
initialization phase over a user-defined factory method:

> So far my fav is still CodecProvider ;)

It seems that the primary reason this object is needed is that IndexReader
needs to be able to find the right decoder when it encounters an unfamiliar
codec name.  Since the core doesn't know about user-created codecs, it's
necessary for the user to register the name => codec pairing in advance so
that core can find it.

If that's this object's main role, I'd suggest "CodecRegistry".

> Naming is the hardest part!!

For me, the hardest parts of API design are...

  A) Designing public abstract classes / interfaces.
  B) Compensating for the curse of knowledge.

Marvin Humphrey

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message