lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-2308) Separately specify a field's type
Date Wed, 31 Aug 2011 12:59:10 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094499#comment-13094499
] 

Michael McCandless commented on LUCENE-2308:
--------------------------------------------

bq. Change FieldType to an interface inside index.* and use it for the source of properties
about an IndexableField. 

+1, I think we should have an oal.index.FieldType interface, that
exposes (get-only) methods.  Ie, we'd just move the getters out of
IndexableField into this new FT interface (likewise for
StorableField).

This interface should be marked as experimental, ie, we are free to
change it.

bq. Add a builder for FieldType to document.* which will create FieldType instances.

I don't think we should use a builder API here; I think either
big-ctor-takes-all-settings and so all fields are final, or what we
have today (.freeze()) is better.

There are two things I don't like about the builder pattern: setter
chaining and the object overhead of hard immutability.

On setter chaining:

  * It's two ways to do the same thing (chaining or not); generally an
    API (and a PL) should offer one (obvious) way to do things.
    Suddenly we'll see tutorials and articles etc. online, some with
    chaining, some without, and some mixed.

  * Code is less readable w/ chaining: it makes it easy to sneak in
    multiple statements per line, embed them into other statements,
    etc., vs unchained where you always have one statement per line

  * I don't like .indexed() as a name; I prefer .setIndexed() so it's
    clear you setting something about the object.

  * In encourages inefficient code, because it's easy to inline new
    X().this().that() when in fact the app really should create &
    reuse FieldType up front.  This is trappy -- the app doesn't
    realize they're creating N+1 objects.

I also don't like the hard immutability (every field is final so every
setter returns a new object) since this will mean the typical use is
creating tons of objects per field per doc.  Yes we can have a mutable
builder with a .build() in the end but that's making the API even more
cumbersome.

In contrast, the "soft" immutability we have now (freeze) is very
effective, and creates no additional objects: it will prevent you from
altering a FT instance once any Field uses it.  Really the
immutability is a minor detail of the implementation here; we only
need it to prevent this trap.

Generally we should try to keep Lucene's core APIs as
plain/simple/straightforward as possible.  Someone can always later
layer on a builder API on top of the simpler setter+freeze or
all-properties-to-ctor API, but, not vice/versa (efficiently anyway).


> Separately specify a field's type
> ---------------------------------
>
>                 Key: LUCENE-2308
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2308
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>              Labels: gsoc2011, lucene-gsoc-11, mentor
>             Fix For: 4.0
>
>         Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, LUCENE-2308-12.patch,
LUCENE-2308-13.patch, LUCENE-2308-14.patch, LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch,
LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, LUCENE-2308-20.patch, LUCENE-2308-21.patch,
LUCENE-2308-3.patch, LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, LUCENE-2308-7.patch,
LUCENE-2308-8.patch, LUCENE-2308-9.patch, LUCENE-2308-branch.patch, LUCENE-2308-final.patch,
LUCENE-2308-ltc.patch, LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, LUCENE-2308-merge-3.patch,
LUCENE-2308.branchdiffs, LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch,
LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch
>
>
> This came up from dicussions on IRC.  I'm summarizing here...
> Today when you make a Field to add to a document you can set things
> index or not, stored or not, analyzed or not, details like omitTfAP,
> omitNorms, index term vectors (separately controlling
> offsets/positions), etc.
> I think we should factor these out into a new class (FieldType?).
> Then you could re-use this FieldType instance across multiple fields.
> The Field instance would still hold the actual value.
> We could then do per-field analyzers by adding a setAnalyzer on the
> FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> for per-field codecs (with flex), where we now have
> PerFieldCodecWrapper).
> This would NOT be a schema!  It's just refactoring what we already
> specify today.  EG it's not serialized into the index.
> This has been discussed before, and I know Michael Busch opened a more
> ambitious (I think?) issue.  I think this is a good first baby step.  We could
> consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message