lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doron Cohen <cdor...@gmail.com>
Subject Re: Confusing norms
Date Thu, 26 May 2011 12:11:02 GMT
Yes I see this too in trunk r1127436 and it seems a bug.
If you uncomment the line that adds the field with NO_NORMS the file is
there as expected.

I think I know where the bug is:
FieldInfo.update() has the wrong logic here:

{code}
      if (this.omitNorms != omitNorms) {
        this.omitNorms = true;                // if one require omitNorms at
least once, it remains off for life
      }
{code}

It should of course be changed to set false in this case.

Doron

On Thu, May 26, 2011 at 11:32 AM, Shai Erera <serera@gmail.com> wrote:

> Hi
>
> I wrote the following test:
>
> {code}
>   public void testConfusingNorms() throws Exception {
>     Directory dir = newDirectory();
>     LogMergePolicy lmp = newLogMergePolicy(false);
>     IndexWriterConfig conf = newIndexWriterConfig(TEST_VERSION_CURRENT,
>         new MockAnalyzer(random)).setMergePolicy(lmp);
>     IndexWriter w = new IndexWriter(dir, conf);
>     Document doc = new Document();
>     doc.add(new Field("c", "some text", Store.YES, Index.ANALYZED));
>     w.addDocument(doc);
>     doc = new Document();
>     doc.add(new Field("c", "delete", Store.NO,
> Index.NOT_ANALYZED_NO_NORMS));
>     w.addDocument(doc);
>     w.close();
>
>     IndexReader r = IndexReader.open(dir, false);
>     r.setNorm(0, "c", (byte) 1);
>     r.close();
>
>     // Look for the sep norms file
>     boolean found = false;
>     for (String s : dir.listAll()) {
>       if (IndexFileNames.isSeparateNormsFile(s)) {
>         found = true;
>         break;
>       }
>     }
>     assertTrue("separate norms file not found", found);
>
>     dir.close();
>   }
> {code}
>
> You will also need to add that method to IndexFileNames (not committed
> yet):
> {code}
>   /**
>    * Returns true if the given filename ends with the separate norms file
>    * pattern: {@code SEPARATE_NORMS_EXTENSION + "[0-9]+"}.
>    */
>   public static boolean isSeparateNormsFile(String filename) {
>     int idx = filename.lastIndexOf('.');
>     if (idx == -1) return false;
>     String ext = filename.substring(idx + 1);
>     return Pattern.matches(SEPARATE_NORMS_EXTENSION + "[0-9]+", ext);
>   }
> {code}
>
> The test adds two documents with a field "c", one analyzed and one not and
> also no norms. According to "NOT_ANALYZED_NO_NORMS":
>
> Note that once you index a given field *with* norms enabled, disabling
>> norms will have no effect.
>> In other words, for this to have the above described effect on a field,
>> all instances of that field
>> must be indexed with NOT_ANALYZED_NO_NORMS from the beginning.
>>
>
> I'd expect that since I add one instance of the field w/ norms enabled,
> then norms will exist for that field, however that's not the case.
>
> The code which sets the norms by IndexReader does not do anything, because
> SegmentReader.doSetNorms thinks this is not an indexed field (or assuming
> the documentation is wrong, a field w/o norms):
>
>   protected void doSetNorm(int doc, String field, byte value) throws
> IOException {
>     SegmentNorms norm = norms.get(field);
>     if (norm == null)                             // not an indexed field
>       return;
>
> The same test runs fine on 3x, so I assume there is a bug in the code
> somewhere only on trunk?
>
> Shai
>

Mime
View raw message