lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] [Commented] (LUCENE-2308) Separately specify a field's type
Date Sun, 26 Jun 2011 14:40:47 GMT


Michael McCandless commented on LUCENE-2308:

Great -- everything compiles and all core tests pass ("ant test-core")
with this patch, including TestDemo, which is the only test cutover so
far to the new Field/Document API!


  * For the new Document (oal.document2.Document), I think we should
    remove setBoost and require instead that apps should do this
    multiplication themselves into the field's boost.

  * Can you remove all methods from the new Document class, and add
    back only what we need for the tests, as we cutover?  (Ie, vs
    copying everything from Document).  We'll need to later iterate on
    this API, to fix problems w/ existing Document class.  EG I don't
    like that lookup-by-name is secretly O(N) cost, that multi-valued
    fields are "awkward", and I'm not sure we should have "sugar"
    methods like String getField(String name)).

  * The TestDemo cutover already exposes a danger w/ the new API: it's
    modifying TextField.DEFAULT_TYPE.  I think somehow we need to make
    these default types read-only?  EG maybe each has a "frozen" bit,
    and we through IllegalStateExc if you try to change anything once
    it's frozen?

I think the next step is to cutover more and more tests...

> Separately specify a field's type
> ---------------------------------
>                 Key: LUCENE-2308
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>              Labels: gsoc2011, lucene-gsoc-11, mentor
>             Fix For: 4.0
>         Attachments: LUCENE-2308-2.patch, LUCENE-2308-3.patch, LUCENE-2308-4.patch, LUCENE-2308-4.patch,
LUCENE-2308.patch, LUCENE-2308.patch
> This came up from dicussions on IRC.  I'm summarizing here...
> Today when you make a Field to add to a document you can set things
> index or not, stored or not, analyzed or not, details like omitTfAP,
> omitNorms, index term vectors (separately controlling
> offsets/positions), etc.
> I think we should factor these out into a new class (FieldType?).
> Then you could re-use this FieldType instance across multiple fields.
> The Field instance would still hold the actual value.
> We could then do per-field analyzers by adding a setAnalyzer on the
> FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> for per-field codecs (with flex), where we now have
> PerFieldCodecWrapper).
> This would NOT be a schema!  It's just refactoring what we already
> specify today.  EG it's not serialized into the index.
> This has been discussed before, and I know Michael Busch opened a more
> ambitious (I think?) issue.  I think this is a good first baby step.  We could
> consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> off on that for starters...

This message is automatically generated by JIRA.
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message