lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document
Date Wed, 08 Apr 2009 12:02:12 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696994#action_12696994
] 

Michael McCandless commented on LUCENE-1590:
--------------------------------------------


Patch looks good!  All tests pass.  That was trickier than I expected;
thanks Uwe.  I plan to commit in a day or two.

It's a good catch, all the places in FieldsReader where we fail to
carryover OTFAP from FieldInfo --> Field instance on the document.
It's yet another example of how having the loaded Document "seem like"
the indexed document causes problems.

In the ideal future (I think?), the fields on a "loaded" Document
would make no effort to convey these index-time options like
omitNorms, OTFAP, etc., because those settings are "semi-global"
(absorbed into the FieldInfos for the current segment).  And something
like boost, which the API lets you access on a loaded doc, is always
wrong since we cannot recreate that (it's not stored, directly, in the
index).

At indexing time, all these if's all over the place to conditionalize
the defaults depending on whether the field is indexed, are also
spooky.  It's as if we should have a separate class (IndexedField)
that privately carries these values.  Then a StoredField wouldn't even
have them.  But that approach breaks down because we'd also want an
IndexedAndStoredField.

Or... perhaps we move all the indexing-specific settings out of
Field.java and into Field.Index.  After all, these details really
describe tweaks on how Lucene will do its indexing, so they don't
really belong in the main Field.java class.


> Stored-only fields automatically enable norms and tf when added to document
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-1590
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1590
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.4, 2.4.1, 2.9
>            Reporter: Uwe Schindler
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: LUCENE-1590.patch, LUCENE-1590.patch, LUCENE-1590.patch
>
>
> During updating my internal components to the new TrieAPI, I have seen the following:
> I index a lot of numeric fields with trie encoding omitting norms and term frequency.
This works great. Luke shows that both is omitted.
> As I sometimes also want to have the components of the field stored and want to use the
same field name for it. So I add additionally the field again to the document, but stored
only (as the Field c'tor using a TokenStream cannot additionally store the field). As it is
stored only, I thought, that I can left out explicit setting of omitNorms and omitTermFreqAndPositions.
After adding the stored-only-without-omits field, Luke shows all fields with norms enabled.
I am not sure, if the norms/tf were really added to the index, but Luke shows a value for
the norms and FieldInfo has it enabled.
> In my opinion, this is not intuitive, o.a.l.document.Field  should switch both omit*
options on when storing fields only (and also disable other indexing-only options). Alternatively
the internal FieldInfo.update(boolean isIndexed, boolean storeTermVector, boolean storePositionWithTermVector,
boolean storeOffsetWithTermVector, boolean omitNorms, boolean storePayloads, boolean omitTermFreqAndPositions)
should only change the omit* and other options, if the isIndexed parameter (not this.isIndexed)
is also true, elsewhere leave it as it is.
> In principle, when adding a stored-only field, any indexing-specific options should not
be changed in FieldInfo. If the field was indexed with norms before, norms should stay enabled
(but this would be the default as it is).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message