lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Elaine Cario <etca...@gmail.com>
Subject Re: difference in behavior of term boosting between Solr 6 and Solr 7
Date Wed, 23 Jan 2019 12:26:29 GMT
I predicted some colleague would come to me 2 minutes after I sent this
with some finding - I was wrong, it was a few hours! It seems there was a
change in a custom similarity class (I think because of an API change in
Solr), which caused the query boost to not be applied.  We're looking at
this angle, so please ignore this for now.

On Tue, Jan 22, 2019 at 11:16 AM Elaine Cario <etcario@gmail.com> wrote:

> We're preparing to upgrade from Solr 6.4.2 to Solr 7.6.0, and found an
> inconsistency in scoring. It appears that term boosts in the query are not
> applied in Solr 7.
>
> The query itself against both versions is identical (removed un-important
> params):
>
> <str name="q">("one"^1) OR ("two"^2) OR ("three"^3)</str>
> <str name="defType">edismax</str>
> <str name="qf">max_term</str>
> <str name="q.op">AND</str>
> <str name="fq">dictionary_id:"WKUS-TAL-DEPLURALIZATION-THESAURUS"</str>
> <str name="rows">100</str>
> <str name="wt">xml</str>
> <str name="debugQuery">on</str>
> </lst>
>
> 3 documents are returned, but in Solr 6 results the docs are returned in
> order of the boosts (three, two, one), as the boosts accounts for the
> entirety of the score, while in Solr 7 they are returned randomly, as all
> the scores are 1.0.
>
> Looking at the debug and explains, in Solr 6 the boost is multiplied to
> the rest of the score:
>
> <lst name="debug">
> <str name="rawquerystring">("one"^1) OR ("two"^2) OR ("three"^3)</str>
> <str name="querystring">("one"^1) OR ("two"^2) OR ("three"^3)</str>
> <str name="parsedquery">(+(DisjunctionMaxQuery((max_term:"aaaa one
> zzzz"))^1.0 DisjunctionMaxQuery((max_term:"aaaa two zzzz"))^2.0
> DisjunctionMaxQuery((max_term:"aaaa three zzzz"))^3.0))/no_coord</str>
> <str name="parsedquery_toString">+(((max_term:"aaaa one zzzz"))^1.0
> ((max_term:"aaaa two zzzz"))^2.0 ((max_term:"aaaa three zzzz"))^3.0)</str>
> <lst name="explain">
> <str name="WKUS-TAL-DEPLURALIZATION-THESAURUS_three">
> 3.0 = sum of:
>   3.0 = weight(max_term:"aaaa three zzzz" in 658) [WKSimilarity], result
> of:
>     3.0 = score(doc=658,freq=1.0 = phraseFreq=1.0
> ), product of:
>       3.0 = boost
>       1.0 = idf(), for phrases, always set to 1
>       1.0 = tfNorm, computed as (freq * (k1a + 1)) / (freq + k1b)
> [WKSimilarity] from:
>         1.0 = phraseFreq=1.0
>         1.2 = k1a
>         1.2 = k1b
>         0.0 = b (norms omitted for field)
> </str>
>
> But in Solr 7, the boost is not there at all:
>
> <lst name="debug">
> <str name="rawquerystring">("one"^1) OR ("two"^2) OR ("three"^3)</str>
> <str name="querystring">("one"^1) OR ("two"^2) OR ("three"^3)</str>
> <str name="parsedquery">+((+DisjunctionMaxQuery((max_term:"aaaa one
> zzzz"))^1.0) (+DisjunctionMaxQuery((max_term:"aaaa two zzzz"))^2.0)
> (+DisjunctionMaxQuery((max_term:"aaaa three zzzz"))^3.0))</str>
> <str name="parsedquery_toString">+((+((max_term:"aaaa one zzzz"))^1.0)
> (+((max_term:"aaaa two zzzz"))^2.0) (+((max_term:"aaaa three
> zzzz"))^3.0))</str>
> <lst name="explain">
> <str name="WKUS-TAL-DEPLURALIZATION-THESAURUS_two">
> 1.0 = sum of:
>   1.0 = weight(max_term:"aaaa two zzzz" in 436) [WKSimilarity], result of:
>     1.0 = score(doc=436,freq=1.0 = phraseFreq=1.0
> ), product of:
>       1.0 = idf(), for phrases, always set to 1
>       1.0 = tfNorm, computed as (freq * (k1a + 1)) / (freq + k1b)
> [WKSimilarity] from:
>         1.0 = phraseFreq=1.0
>         1.2 = k1a
>         1.2 = k1b
>         0.0 = b (norms omitted for field)
> </str>
>
> I noted a subtle difference in the parsedquery between the 2 versions as
> well, not sure if that is causing the boost to drop out in Solr 7:
>
> SOLR 6:  +(((max_term:"aaaa one zzzz"))^1.0 ((max_term:"aaaa two
> zzzz"))^2.0 ((max_term:"aaaa three zzzz"))^3.0)
> SOLR 7:  +((+((max_term:"aaaa one zzzz"))^1.0) (+((max_term:"aaaa two
> zzzz"))^2.0) (+((max_term:"aaaa three zzzz"))^3.0))
> For our use case , I think we can work around it using a constant score
> query, but it would be good to know if this is a bug or expected behavior,
> or we're missing something in the query to get boost to work again.
>
> Thanks!
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message