lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] [Commented] (LUCENE-2308) Separately specify a field's type
Date Tue, 30 Aug 2011 16:11:38 GMT


Michael McCandless commented on LUCENE-2308:

bq.  I think FT should be immutable?
bq.  I don't like the idea of mutable FieldTypes that are reused across different fields because
I am concerned that somehow the 'wrong configuration' will be applied accidentally.

This is why we have FT.freeze, and why an FT is frozen as soon as it's
used in a Field.  But I agree it'd be even better if we had true
immutability (all fields in FT are final).

bq. I think FieldType should be a simple immutable class with a single ctor that takes the
minimal stuff that we (core lucene) need.
bq. It can still be concrete, but then you have to specify everything. Then, things like TextField/StringField
are sugar APIs for common configurations.

This is a neat idea!

Another plus is this is a single place where we can check consistency
of the settings (eg you cannot enable term vectors if indexed is

So this would mean we'd have alternate ctors to the sugar classes for
the common cases, like maybe:
   new StringField(name, value)
   new StoredStringField(name, value)

StringField would always omitNorms, not tokenize, index DOCS_ONLY.

For TextField maybe:
   new TextField(name, value)
   new TextField(name, value, omitNorms)
   new TextField(name, value, omitNorms, indexTVPos, indexTVOffsets)
   new StoredTextField(name, value)
   new StoredTextField(name, value, omitNorms)
   new StoredTextField(name, value, omitNorms, indexTVPos, indexTVOffsets)

Expert usage would always have the out of invoking FT directly with
all options.  Even more expert usage can bypass the "userspace"
FieldType/Field/Document entirely and code directly to IndexableField

bq. I think BinaryField should be able to index as binary?

I agree!  Not sure on the details of how we'd do that though... Today
this field is only stored byte[].

> Separately specify a field's type
> ---------------------------------
>                 Key: LUCENE-2308
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>              Labels: gsoc2011, lucene-gsoc-11, mentor
>             Fix For: 4.0
>         Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, LUCENE-2308-12.patch,
LUCENE-2308-13.patch, LUCENE-2308-14.patch, LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch,
LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, LUCENE-2308-20.patch, LUCENE-2308-21.patch,
LUCENE-2308-3.patch, LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, LUCENE-2308-7.patch,
LUCENE-2308-8.patch, LUCENE-2308-9.patch, LUCENE-2308-branch.patch, LUCENE-2308-final.patch,
LUCENE-2308-ltc.patch, LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, LUCENE-2308-merge-3.patch,
LUCENE-2308.branchdiffs, LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch,
LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch
> This came up from dicussions on IRC.  I'm summarizing here...
> Today when you make a Field to add to a document you can set things
> index or not, stored or not, analyzed or not, details like omitTfAP,
> omitNorms, index term vectors (separately controlling
> offsets/positions), etc.
> I think we should factor these out into a new class (FieldType?).
> Then you could re-use this FieldType instance across multiple fields.
> The Field instance would still hold the actual value.
> We could then do per-field analyzers by adding a setAnalyzer on the
> FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> for per-field codecs (with flex), where we now have
> PerFieldCodecWrapper).
> This would NOT be a schema!  It's just refactoring what we already
> specify today.  EG it's not serialized into the index.
> This has been discussed before, and I know Michael Busch opened a more
> ambitious (I think?) issue.  I think this is a good first baby step.  We could
> consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> off on that for starters...

This message is automatically generated by JIRA.
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message