lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Barry (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-4559) PerFieldSimilarityWrapper issue with queryNorm() and coord()
Date Wed, 14 Nov 2012 19:12:12 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Joel Barry updated LUCENE-4559:
-------------------------------

    Summary: PerFieldSimilarityWrapper issue with queryNorm() and coord()  (was: PerFieldSimilarityWrapper)
    
> PerFieldSimilarityWrapper issue with queryNorm() and coord()
> ------------------------------------------------------------
>
>                 Key: LUCENE-4559
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4559
>             Project: Lucene - Core
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Joel Barry
>            Priority: Minor
>
> This issue requests that documentation be clarified for the current
> behavior of queryNorm() and coord() on PerFieldAnalyzerWrapper and
> that support is added for the use case described below.
> The documentation for PerFieldAnalyzerWrapper (lucene 4.0) says:
> {noformat}
>   Subclasses should implement get(String) to return an appropriate
>   Similarity (for example, using field-specific parameter values) for
>   the field.
> {noformat}
> This is misleading because of the behavior for queryNorm() and
> coord().  The Similarity returned from get() is not accessed for these
> methods.  Instead, the PerFieldAnalyzerWrapper subclass methods are
> called.  I understand that this is because these methods apply to the
> query as a whole rather than per field.  However, consider the
> following.  A PerFieldAnalyzerWrapper with no per-field behavior (just
> returns DefaultSimilarity in get()) behaves differently than
> DefaultSimilarity itself:
> {noformat}
> class MyPerFieldSimilarity1 extends PerFieldSimilarityWrapper {
>     @Override
>     public Similarity get(String name) {
>         return new DefaultSimilarity();
>     }
> }
> public class PerFieldSimilarityWrapperTest {    
>     private float runQuery(Similarity similarity) throws IOException {
>         IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_40, new WhitespaceAnalyzer(Version.LUCENE_40));
>         config.setSimilarity(similarity);
>         Directory dir = new RAMDirectory();
>         IndexWriter writer = new IndexWriter(dir, config);
>         Document doc = new Document();
>         doc.add(new TextField("A-field", "first", Store.YES));
>         writer.addDocument(doc);
>         writer.commit();
>         
>         IndexReader reader = DirectoryReader.open(dir);
>         IndexSearcher searcher = new IndexSearcher(reader);
>         searcher.setSimilarity(similarity);
>         TermQuery query = new TermQuery(new Term("A-field", "first"));
>         TopDocs topDocs = searcher.search(query, 1);
>         return topDocs.scoreDocs[0].score;
>     }
>     
>     @Test
>     public void testSimple() throws Exception {
>         float score1 = runQuery(new DefaultSimilarity());
>         float score2 = runQuery(new MyPerFieldSimilarity1());
>         assertEquals(score1, score2, 0.0001);
> 	// java.lang.AssertionError:
> 	//   expected:<0.3068528175354004> but was:<0.09415864944458008>
>     }
> {noformat}
> One solution is to override and forward, e.g.
> {noformat}
> class MyPerFieldSimilarity1 extends PerFieldSimilarityWrapper {
>     @Override
>     public Similarity get(String name) {
>         return new DefaultSimilarity();
>     }
>     @Override
>     public float coord(int overlap, int maxOverlap) {
>         return get("dummy").coord(overlap, maxOverlap);
>     }
>     @Override
>     public float queryNorm(float valueForNormalization) {
>         return get("dummy").queryNorm(valueForNormalization);
>     }
> }
> {noformat}
> However, these methods don't have access to query field data, thus the
> "dummy" argument.
> Suppose an application arranges documents so that there are two
> distinct field groupings:
> {noformat}
> Document:
>   A-field1
>   A-field2
>   A-field3
>   B-field1
>   B-field2
>   B-field3
> {noformat}
> The application creates queries that use the A fields, or the B
> fields, but never both A and B in the same query.  Then it seems
> reasonable that PerFieldAnalyzerWrapper should provide a way for
> queryNorm() and coord() to operate on these sets of fields.  This
> cannot be done with the current implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message