lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doron Cohen <cdor...@gmail.com>
Subject Re: Confusing norms
Date Thu, 26 May 2011 12:34:04 GMT
Thanks, my mistake, I see it now:

>  LUCENE-2846: omitNorms now behaves like omitTermFrequencyAndPositions, if
you
>  omitNorms(true) for field "a" for 1000 documents, but then add a document
with
>  omitNorms(false) for field "a", all documents for field "a" will have no
norms.
>  Previously, Lucene would fill the first 1000 documents with "fake norms"
from
>  Similarity.getDefault(). (Robert Muir, Mike Mccandless)

I somehow interpreted wrongly the comment in the code "remains off for life"
- expecting the old behavior and reading "off" as "not ommitted" where
actually "off" stands here for "ommitted", well, all clear now, thanks!

Doron

On Thu, May 26, 2011 at 3:26 PM, Shai Erera <serera@gmail.com> wrote:

> Sorry Doron, I opened LUCENE-3146 to track this and forgot to update this
> thread.
>
> Mike already commented that this is expected behavior in 4.0 (semantics
> were flipped) however we still need to fix some jdocs + there seems to be
> another problem that app may succeed to setNorm, only for that norm be
> discarded on the next merge.
>
> Shai
>
>
> On Thu, May 26, 2011 at 3:11 PM, Doron Cohen <cdoronc@gmail.com> wrote:
>
>> Yes I see this too in trunk r1127436 and it seems a bug.
>> If you uncomment the line that adds the field with NO_NORMS the file is
>> there as expected.
>>
>> I think I know where the bug is:
>> FieldInfo.update() has the wrong logic here:
>>
>> {code}
>>       if (this.omitNorms != omitNorms) {
>>         this.omitNorms = true;                // if one require omitNorms
>> at least once, it remains off for life
>>       }
>> {code}
>>
>> It should of course be changed to set false in this case.
>>
>> Doron
>>
>>
>> On Thu, May 26, 2011 at 11:32 AM, Shai Erera <serera@gmail.com> wrote:
>>
>>> Hi
>>>
>>> I wrote the following test:
>>>
>>> {code}
>>>   public void testConfusingNorms() throws Exception {
>>>     Directory dir = newDirectory();
>>>     LogMergePolicy lmp = newLogMergePolicy(false);
>>>     IndexWriterConfig conf = newIndexWriterConfig(TEST_VERSION_CURRENT,
>>>         new MockAnalyzer(random)).setMergePolicy(lmp);
>>>     IndexWriter w = new IndexWriter(dir, conf);
>>>     Document doc = new Document();
>>>     doc.add(new Field("c", "some text", Store.YES, Index.ANALYZED));
>>>     w.addDocument(doc);
>>>     doc = new Document();
>>>     doc.add(new Field("c", "delete", Store.NO,
>>> Index.NOT_ANALYZED_NO_NORMS));
>>>     w.addDocument(doc);
>>>     w.close();
>>>
>>>     IndexReader r = IndexReader.open(dir, false);
>>>     r.setNorm(0, "c", (byte) 1);
>>>     r.close();
>>>
>>>     // Look for the sep norms file
>>>     boolean found = false;
>>>     for (String s : dir.listAll()) {
>>>       if (IndexFileNames.isSeparateNormsFile(s)) {
>>>         found = true;
>>>         break;
>>>       }
>>>     }
>>>     assertTrue("separate norms file not found", found);
>>>
>>>     dir.close();
>>>   }
>>> {code}
>>>
>>> You will also need to add that method to IndexFileNames (not committed
>>> yet):
>>> {code}
>>>   /**
>>>    * Returns true if the given filename ends with the separate norms file
>>>    * pattern: {@code SEPARATE_NORMS_EXTENSION + "[0-9]+"}.
>>>    */
>>>   public static boolean isSeparateNormsFile(String filename) {
>>>     int idx = filename.lastIndexOf('.');
>>>     if (idx == -1) return false;
>>>     String ext = filename.substring(idx + 1);
>>>     return Pattern.matches(SEPARATE_NORMS_EXTENSION + "[0-9]+", ext);
>>>   }
>>> {code}
>>>
>>> The test adds two documents with a field "c", one analyzed and one not
>>> and also no norms. According to "NOT_ANALYZED_NO_NORMS":
>>>
>>> Note that once you index a given field *with* norms enabled, disabling
>>>> norms will have no effect.
>>>> In other words, for this to have the above described effect on a field,
>>>> all instances of that field
>>>> must be indexed with NOT_ANALYZED_NO_NORMS from the beginning.
>>>>
>>>
>>> I'd expect that since I add one instance of the field w/ norms enabled,
>>> then norms will exist for that field, however that's not the case.
>>>
>>> The code which sets the norms by IndexReader does not do anything,
>>> because SegmentReader.doSetNorms thinks this is not an indexed field (or
>>> assuming the documentation is wrong, a field w/o norms):
>>>
>>>   protected void doSetNorm(int doc, String field, byte value) throws
>>> IOException {
>>>     SegmentNorms norm = norms.get(field);
>>>     if (norm == null)                             // not an indexed field
>>>       return;
>>>
>>> The same test runs fine on 3x, so I assume there is a bug in the code
>>> somewhere only on trunk?
>>>
>>> Shai
>>>
>>
>>
>

Mime
View raw message