lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Multiple boost queries on a specific field
Date Mon, 20 Jul 2015 22:12:15 GMT

: /?q=&wt=json&defType=dismax&q.alt=*:*&bq=provider:A^2.0/
: My first results have provider A.

: ?q=&wt=json&defType=dismax&q.alt=*:*&bq=provider:​B​^​1.5 
: My​ first results have provider B. Good!

: /?q=&wt=json&defType=dismax&q.alt=*:*&bq=provider:​(​A^2.0​ B^1.5)​/
: Then my first results have provider B. It's not logical.

Why is that not logical?

If you provide us with the details from your schema about the 
provider field, and the debug=true output from your query showing the 
score explanations for the top doc of that query (and for the first "provider A" 
doc so we can compare) then we might be able to help explain why a "B" doc 
sows up before an "A" doc -- but you haven't provided near enough info for 
anyhitng other then a wild guess... best wild guess is that it has to do with either the IDF of those 
two terms, or the lengthNorm of the "provider" field for the various docs.

Most likely "bq" isn't even remotely what you want however, since it's an 
*additive* boost, and will be affected by the overall queryNorm of the 
query it's a part of -- so even if you get things dialled in just like you 
want them with a "*:*" query, you might find yourself with totlaly 
differnet results once you start using a "real" query.

Assuming every document has at most 1 "provider" then what would probably 
work best for you is to use (edismax with) something like this...

boost=max(prod(2.0, termfreq(provider,'A')),
          prod(1.5, termfreq(provider,'B')),
          prod(..., termfreq(provider,...)),

...or if you want use edismax, then instead wrap the "boost" QParser 
arround your dismax query...

  q={!boost b=$boost v=$qq defType=dismax}
  qq=...whatever your normal dismax query is...

What that will give you (in either case) is a *multiplicitive* boost by 
each of those values depending on which of those terms exists in the 
provier field -- the "prod" function multiples each value by "1" if the 
corrisponding provider string is in the term once, or "0" if that provider 
isn't in the field (hence the assumption of "at most 1 provider") and then 
the max function just picks one.

Depending on the specifics of your usecase, you could alterantive 
use sum(...) instead of max if some docs are from multiple providers, 

But the details of *why* you are currently getting the results you are 
getting, and what you consider illogical about them, are a huge factor in 
giving you good advice to move forward.

  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message