lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document
Date Wed, 08 Apr 2009 21:56:13 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697238#action_12697238
] 

Uwe Schindler commented on LUCENE-1590:
---------------------------------------

bq. Since FieldInfos is per-segment, one challenge is how Multi*Reader should work. Should
it simply merge on-the-fly? (ie present a single FieldInfo that merged the fields by the same
name across all segmens)

Maybe merge with the already existing FieldInfos/FieldInfo methods.

A new case for this would be good, after thinking a little bit about it, I may open one. But
in general it should be combined with the Document/Fields redesign.

bq. This sounds like a good stop-gap measure, but I'd rather put our energy towards exposing
the schema, decoupling "retrieved" Fields from indexed fields, etc.

Yes, and it will not work. I think, we leave the patch as it is, maybe remove the omitTf and
omitNorms update for binary fields. Binary fields are "special".

> Stored-only fields automatically enable norms and tf when added to document
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-1590
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1590
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.4, 2.4.1, 2.9
>            Reporter: Uwe Schindler
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: LUCENE-1590.patch, LUCENE-1590.patch, LUCENE-1590.patch
>
>
> During updating my internal components to the new TrieAPI, I have seen the following:
> I index a lot of numeric fields with trie encoding omitting norms and term frequency.
This works great. Luke shows that both is omitted.
> As I sometimes also want to have the components of the field stored and want to use the
same field name for it. So I add additionally the field again to the document, but stored
only (as the Field c'tor using a TokenStream cannot additionally store the field). As it is
stored only, I thought, that I can left out explicit setting of omitNorms and omitTermFreqAndPositions.
After adding the stored-only-without-omits field, Luke shows all fields with norms enabled.
I am not sure, if the norms/tf were really added to the index, but Luke shows a value for
the norms and FieldInfo has it enabled.
> In my opinion, this is not intuitive, o.a.l.document.Field  should switch both omit*
options on when storing fields only (and also disable other indexing-only options). Alternatively
the internal FieldInfo.update(boolean isIndexed, boolean storeTermVector, boolean storePositionWithTermVector,
boolean storeOffsetWithTermVector, boolean omitNorms, boolean storePayloads, boolean omitTermFreqAndPositions)
should only change the omit* and other options, if the isIndexed parameter (not this.isIndexed)
is also true, elsewhere leave it as it is.
> In principle, when adding a stored-only field, any indexing-specific options should not
be changed in FieldInfo. If the field was indexed with norms before, norms should stay enabled
(but this would be the default as it is).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message