lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] [Commented] (LUCENE-2308) Separately specify a field's type
Date Wed, 31 Aug 2011 12:59:10 GMT


Michael McCandless commented on LUCENE-2308:

bq. Change FieldType to an interface inside index.* and use it for the source of properties
about an IndexableField. 

+1, I think we should have an oal.index.FieldType interface, that
exposes (get-only) methods.  Ie, we'd just move the getters out of
IndexableField into this new FT interface (likewise for

This interface should be marked as experimental, ie, we are free to
change it.

bq. Add a builder for FieldType to document.* which will create FieldType instances.

I don't think we should use a builder API here; I think either
big-ctor-takes-all-settings and so all fields are final, or what we
have today (.freeze()) is better.

There are two things I don't like about the builder pattern: setter
chaining and the object overhead of hard immutability.

On setter chaining:

  * It's two ways to do the same thing (chaining or not); generally an
    API (and a PL) should offer one (obvious) way to do things.
    Suddenly we'll see tutorials and articles etc. online, some with
    chaining, some without, and some mixed.

  * Code is less readable w/ chaining: it makes it easy to sneak in
    multiple statements per line, embed them into other statements,
    etc., vs unchained where you always have one statement per line

  * I don't like .indexed() as a name; I prefer .setIndexed() so it's
    clear you setting something about the object.

  * In encourages inefficient code, because it's easy to inline new
    X().this().that() when in fact the app really should create &
    reuse FieldType up front.  This is trappy -- the app doesn't
    realize they're creating N+1 objects.

I also don't like the hard immutability (every field is final so every
setter returns a new object) since this will mean the typical use is
creating tons of objects per field per doc.  Yes we can have a mutable
builder with a .build() in the end but that's making the API even more

In contrast, the "soft" immutability we have now (freeze) is very
effective, and creates no additional objects: it will prevent you from
altering a FT instance once any Field uses it.  Really the
immutability is a minor detail of the implementation here; we only
need it to prevent this trap.

Generally we should try to keep Lucene's core APIs as
plain/simple/straightforward as possible.  Someone can always later
layer on a builder API on top of the simpler setter+freeze or
all-properties-to-ctor API, but, not vice/versa (efficiently anyway).

> Separately specify a field's type
> ---------------------------------
>                 Key: LUCENE-2308
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>              Labels: gsoc2011, lucene-gsoc-11, mentor
>             Fix For: 4.0
>         Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, LUCENE-2308-12.patch,
LUCENE-2308-13.patch, LUCENE-2308-14.patch, LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch,
LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, LUCENE-2308-20.patch, LUCENE-2308-21.patch,
LUCENE-2308-3.patch, LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, LUCENE-2308-7.patch,
LUCENE-2308-8.patch, LUCENE-2308-9.patch, LUCENE-2308-branch.patch, LUCENE-2308-final.patch,
LUCENE-2308-ltc.patch, LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, LUCENE-2308-merge-3.patch,
LUCENE-2308.branchdiffs, LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch,
LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch
> This came up from dicussions on IRC.  I'm summarizing here...
> Today when you make a Field to add to a document you can set things
> index or not, stored or not, analyzed or not, details like omitTfAP,
> omitNorms, index term vectors (separately controlling
> offsets/positions), etc.
> I think we should factor these out into a new class (FieldType?).
> Then you could re-use this FieldType instance across multiple fields.
> The Field instance would still hold the actual value.
> We could then do per-field analyzers by adding a setAnalyzer on the
> FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> for per-field codecs (with flex), where we now have
> PerFieldCodecWrapper).
> This would NOT be a schema!  It's just refactoring what we already
> specify today.  EG it's not serialized into the index.
> This has been discussed before, and I know Michael Busch opened a more
> ambitious (I think?) issue.  I think this is a good first baby step.  We could
> consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> off on that for starters...

This message is automatically generated by JIRA.
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message