lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: Confusing norms
Date Thu, 26 May 2011 12:26:50 GMT
Sorry Doron, I opened LUCENE-3146 to track this and forgot to update this
thread.

Mike already commented that this is expected behavior in 4.0 (semantics were
flipped) however we still need to fix some jdocs + there seems to be another
problem that app may succeed to setNorm, only for that norm be discarded on
the next merge.

Shai

On Thu, May 26, 2011 at 3:11 PM, Doron Cohen <cdoronc@gmail.com> wrote:

> Yes I see this too in trunk r1127436 and it seems a bug.
> If you uncomment the line that adds the field with NO_NORMS the file is
> there as expected.
>
> I think I know where the bug is:
> FieldInfo.update() has the wrong logic here:
>
> {code}
>       if (this.omitNorms != omitNorms) {
>         this.omitNorms = true;                // if one require omitNorms
> at least once, it remains off for life
>       }
> {code}
>
> It should of course be changed to set false in this case.
>
> Doron
>
>
> On Thu, May 26, 2011 at 11:32 AM, Shai Erera <serera@gmail.com> wrote:
>
>> Hi
>>
>> I wrote the following test:
>>
>> {code}
>>   public void testConfusingNorms() throws Exception {
>>     Directory dir = newDirectory();
>>     LogMergePolicy lmp = newLogMergePolicy(false);
>>     IndexWriterConfig conf = newIndexWriterConfig(TEST_VERSION_CURRENT,
>>         new MockAnalyzer(random)).setMergePolicy(lmp);
>>     IndexWriter w = new IndexWriter(dir, conf);
>>     Document doc = new Document();
>>     doc.add(new Field("c", "some text", Store.YES, Index.ANALYZED));
>>     w.addDocument(doc);
>>     doc = new Document();
>>     doc.add(new Field("c", "delete", Store.NO,
>> Index.NOT_ANALYZED_NO_NORMS));
>>     w.addDocument(doc);
>>     w.close();
>>
>>     IndexReader r = IndexReader.open(dir, false);
>>     r.setNorm(0, "c", (byte) 1);
>>     r.close();
>>
>>     // Look for the sep norms file
>>     boolean found = false;
>>     for (String s : dir.listAll()) {
>>       if (IndexFileNames.isSeparateNormsFile(s)) {
>>         found = true;
>>         break;
>>       }
>>     }
>>     assertTrue("separate norms file not found", found);
>>
>>     dir.close();
>>   }
>> {code}
>>
>> You will also need to add that method to IndexFileNames (not committed
>> yet):
>> {code}
>>   /**
>>    * Returns true if the given filename ends with the separate norms file
>>    * pattern: {@code SEPARATE_NORMS_EXTENSION + "[0-9]+"}.
>>    */
>>   public static boolean isSeparateNormsFile(String filename) {
>>     int idx = filename.lastIndexOf('.');
>>     if (idx == -1) return false;
>>     String ext = filename.substring(idx + 1);
>>     return Pattern.matches(SEPARATE_NORMS_EXTENSION + "[0-9]+", ext);
>>   }
>> {code}
>>
>> The test adds two documents with a field "c", one analyzed and one not and
>> also no norms. According to "NOT_ANALYZED_NO_NORMS":
>>
>> Note that once you index a given field *with* norms enabled, disabling
>>> norms will have no effect.
>>> In other words, for this to have the above described effect on a field,
>>> all instances of that field
>>> must be indexed with NOT_ANALYZED_NO_NORMS from the beginning.
>>>
>>
>> I'd expect that since I add one instance of the field w/ norms enabled,
>> then norms will exist for that field, however that's not the case.
>>
>> The code which sets the norms by IndexReader does not do anything, because
>> SegmentReader.doSetNorms thinks this is not an indexed field (or assuming
>> the documentation is wrong, a field w/o norms):
>>
>>   protected void doSetNorm(int doc, String field, byte value) throws
>> IOException {
>>     SegmentNorms norm = norms.get(field);
>>     if (norm == null)                             // not an indexed field
>>       return;
>>
>> The same test runs fine on 3x, so I assume there is a bug in the code
>> somewhere only on trunk?
>>
>> Shai
>>
>
>

Mime
View raw message