lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eustache Felenc <eustache.fel...@idilia.com>
Subject Boolean Query Scorer Over-weighting Query Terms With Synonyms
Date Tue, 19 Mar 2013 14:16:28 GMT
Hi,

I don't understand why the scorer is making a sum of the weight of the 
OR clauses. It seems to me that it is unbalancing the query scoring 
toward the term that has more alternatives. To me it would make more 
sense to have the max of the weight of query term alternatives.

Here is an example:
I ran in the solr admin interface: gucci (handbag OR purse OR pocketbook)
By clicking debug I can see that the parsed query is as expected: 
"parsedquery":"text:gucci (text:handbag text:purse text:pocketbook)"
The explain field shows that the scorer is making (I simplify a bit 
here): weight(gucci) + sum( weight(handbag) + weight(purse) + 
weight(pocketbook))
The consequence is that a result containing handbag, purse and 
pocketbook is going to have a higher score than a result containing 
gucci and handbag. I think this is counter-intuitive. To me the OR means 
those terms are equivalent, not that they are more important. Besides I 
could use query term boosting to do this independently.

I experimented with Edismax and it has similar behaviour.

The question are, am I missing something ? Is there a way to have an OR 
clause which preserve query term relative "importance" (note that 
playing with mm in edismax does not solve the issue) ?

Thanks !



Mime
View raw message