lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document
Date Wed, 08 Apr 2009 14:01:14 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697024#action_12697024
] 

Uwe Schindler commented on LUCENE-1590:
---------------------------------------

{quote}
Patch looks good! All tests pass. That was trickier than I expected;
thanks Uwe. I plan to commit in a day or two.
{quote}

The only tricky part was the FieldsReader. The original bug was fixed in a few lines (FieldInfo
ctor and update()).

{quote}
It's a good catch, all the places in FieldsReader where we fail to
carryover OTFAP from FieldInfo --> Field instance on the document.
It's yet another example of how having the loaded Document "seem like"
the indexed document causes problems.
{quote}

I am still not happy with the new FieldReader because it cannot replicate all indexing infos
(but now does almost everything). I know, it does not affect functionality (as only the stored
contents can be retrieved). In principle the Field instances should have *no* indexing options.
Luke would the display nothing anymore, but for this case it would really be better to make
the Field infos "public", so somebody could enumerate all fields and test then, which options
were used during indexing. Mixing this with retrieval of stored fields is not good.

One case is now not implemented correctly in FieldsReader: A binary stored field have a special
if-clause in FieldsReader. The binary field is loaded as stored only, currently only omitTf
and omitNorms are set (I added this). But e.g. INDEX is always false and so on. In principle
for completeness, all options from FieldInfo should be replicated here.
FieldsReader would be better to have a central method like copyFieldOptions(FieldInfo, Fieldable),
that copies all options from FieldInfo to the Fieldable (without looking at the stored contents).
The other if-cases should only initialize the stored parts and type. I think, I give it a
try.
The whole info is now more important: If somebody in the past had stored the string contents
compressed, he must now use a binary field and compress himself. In this case, Luke would
not display any indexing options anymore. This is not bad, but inconsistent.

So the better case is to make the Field properties public not on the document level, but on
the IndexReader level.

> Stored-only fields automatically enable norms and tf when added to document
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-1590
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1590
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.4, 2.4.1, 2.9
>            Reporter: Uwe Schindler
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: LUCENE-1590.patch, LUCENE-1590.patch, LUCENE-1590.patch
>
>
> During updating my internal components to the new TrieAPI, I have seen the following:
> I index a lot of numeric fields with trie encoding omitting norms and term frequency.
This works great. Luke shows that both is omitted.
> As I sometimes also want to have the components of the field stored and want to use the
same field name for it. So I add additionally the field again to the document, but stored
only (as the Field c'tor using a TokenStream cannot additionally store the field). As it is
stored only, I thought, that I can left out explicit setting of omitNorms and omitTermFreqAndPositions.
After adding the stored-only-without-omits field, Luke shows all fields with norms enabled.
I am not sure, if the norms/tf were really added to the index, but Luke shows a value for
the norms and FieldInfo has it enabled.
> In my opinion, this is not intuitive, o.a.l.document.Field  should switch both omit*
options on when storing fields only (and also disable other indexing-only options). Alternatively
the internal FieldInfo.update(boolean isIndexed, boolean storeTermVector, boolean storePositionWithTermVector,
boolean storeOffsetWithTermVector, boolean omitNorms, boolean storePayloads, boolean omitTermFreqAndPositions)
should only change the omit* and other options, if the isIndexed parameter (not this.isIndexed)
is also true, elsewhere leave it as it is.
> In principle, when adding a stored-only field, any indexing-specific options should not
be changed in FieldInfo. If the field was indexed with norms before, norms should stay enabled
(but this would be the default as it is).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message