lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2308) Separately specify a field's type
Date Fri, 12 Mar 2010 21:11:27 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844684#action_12844684
] 

Michael McCandless commented on LUCENE-2308:
--------------------------------------------

Hmm one challenge with making FieldType immutable is.... we don't want
a zillion ctors over time.  Also creating a FieldType with args like
new FieldType(true, false, false) isn't really readable.

It would be nice if we could do something similar to IndexWriterConfig
(LUCENE-2294), where you use incremental ctor/setters to set up the
configuration but then once it's used ("bound" to a Field), it's
immutable.

I'm torn on naming: yes, search-oriented names like "matchOnly" is
tempting, but then we really should tease apart termFreq and positions
(they are stuck together now with omitTFAP).  And the two are not
fully independent as Marvin noted -- so maybe we use a cryptic enum
(DOCS, DOCS_TERM_FREQ, DOCS_TERM_FREQ_POSITIONS)?  If we can only find
better names...

I'm not sure we can/should find better index-time names.  What is
stored in the index is relatively independent from how/whether
searches make use of it.  EG if you store termFreq (but not positions)
you can still do match only searching, or, you can do full scoring of
the query.  You can't use positional queries.


> Separately specify a field's type
> ---------------------------------
>
>                 Key: LUCENE-2308
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2308
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>
> This came up from dicussions on IRC.  I'm summarizing here...
> Today when you make a Field to add to a document you can set things
> index or not, stored or not, analyzed or not, details like omitTfAP,
> omitNorms, index term vectors (separately controlling
> offsets/positions), etc.
> I think we should factor these out into a new class (FieldType?).
> Then you could re-use this FieldType instance across multiple fields.
> The Field instance would still hold the actual value.
> We could then do per-field analyzers by adding a setAnalyzer on the
> FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> for per-field codecs (with flex), where we now have
> PerFieldCodecWrapper).
> This would NOT be a schema!  It's just refactoring what we already
> specify today.  EG it's not serialized into the index.
> This has been discussed before, and I know Michael Busch opened a more
> ambitious (I think?) issue.  I think this is a good first baby step.  We could
> consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> off on that for starters...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message