lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Barry <jmb...@gmail.com>
Subject questions on PerFieldSimilarityWrapper
Date Wed, 07 Nov 2012 22:23:48 GMT
Hi folks,

I have a question on PerFieldSimilarityWrapper.  It seems that it is
not possible to get per-field behavior on queryNorm() and coord()...

The documentation for PerFieldAnalyzerWrapper (lucene 4.0) says:

  Subclasses should implement get(String) to return an appropriate
  Similarity (for example, using field-specific parameter values) for
  the field.

This leads the user to believe that *only* get() needs to be
overridden. However, I've found that I must override queryNorm() as
well, otherwise Similarity.queryNorm() will be called (because
PerFieldAnalyzerWrapper extends Similarity), not the user-supplied
version.

The test cases in lucene seem always to override queryNorm() and
(coord() too), but I don't see tests for the per-field behavior of
these. Indeed, there seems to be no way to get the field name from
these methods. And that's the problem.  I'd like to have per-field
behavior for queryNorm() and coord().

Below is some code to illustrate the issue:

class MyPerFieldSimilarity1 extends PerFieldSimilarityWrapper {
    @Override
    public Similarity get(String name) {
        return new DefaultSimilarity();
    }
}

class MyPerFieldSimilarity2 extends PerFieldSimilarityWrapper {
    @Override
    public Similarity get(String name) {
        return new DefaultSimilarity();
    }

    @Override
    public float queryNorm(float valueForNormalization) {
        // Notice that I don't have access to the read field name here...
        return get("dummy").queryNorm(valueForNormalization);
    }
}

public class PerFieldSimilarityWrapperTest {
    private float runTest(Similarity similarity) throws IOException {
        IndexWriterConfig config = new
IndexWriterConfig(Version.LUCENE_40, new
WhitespaceAnalyzer(Version.LUCENE_40));
        config.setSimilarity(similarity);
        Directory dir = new RAMDirectory();
        IndexWriter writer = new IndexWriter(dir, config);
        Document doc = new Document();
        String fieldName = "some_field";
        doc.add(new TextField(fieldName, "some text", Store.YES));
        writer.addDocument(doc);
        writer.commit();

        IndexReader reader = DirectoryReader.open(dir);
        IndexSearcher searcher = new IndexSearcher(reader);
        searcher.setSimilarity(similarity);
        TermQuery query = new TermQuery(new Term(fieldName, "text"));
        TopDocs topDocs = searcher.search(query, 1);
        float score = topDocs.scoreDocs[0].score;
        return score;
    }

    public static void main(String[] args) throws IOException {
        PerFieldSimilarityWrapperTest that = new
PerFieldSimilarityWrapperTest();
        System.out.println(that.runTest(new DefaultSimilarity()));
        System.out.println(that.runTest(new MyPerFieldSimilarity1()));
        System.out.println(that.runTest(new MyPerFieldSimilarity2()));
    }
}

Running this produces:

0.19178301
0.058849156
0.19178301

Am I overlooking something here or is this a bug?

Thanks,
- Joel

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message