lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anshum Gupta <ans...@anshumgupta.net>
Subject Re: /select results different between 5.4 and 6.1
Date Fri, 19 Aug 2016 22:22:45 GMT
The default similarity changed from TF-IDF to BM25 in 6.0.

On Fri, Aug 19, 2016 at 3:00 PM John Bickerstaff <john@johnbickerstaff.com>
wrote:

> Bump!
>
> TL;DR Question: Are scores (and debug output) *expected* to be different
> between 5.4 and 6.1?
>
> On Thu, Aug 18, 2016 at 2:44 PM, John Bickerstaff <
> john@johnbickerstaff.com>
> wrote:
>
> > Hi all,
> >
> > TL:DR -
> > Is it expected that the /select endpoint would produce different
> > scores/result order between versions 5.4 and 6.1?
> >
> >
> > (I'm aware that it's certainly possible I've done something different to
> > these environments, although at this point I can't see any difference in
> > configs etc... and I used a very simple search against /select to test
> this)
> >
> > ====== Detail ==========
> >
> > I'm currently seeing different scoring and different result order when I
> > compare Solr results in the Admin console for a 5.4 and 6.1 environment.
> >
> > I'm using the /select endpoint to try to avoid any difference in
> > configuration.  To the best of my knowledge (and reading) I haven't ever
> > modified the xml for that endpoint.
> >
> > As I was looking into it, I saw that the debug output looks quite
> > different in 6.1...
> >
> > Any advice, including "You must have broken it yourself, that's
> > impossible" is much appreciated.
> >
> >
> >
> > Here's debug from the "old" 5.4 SolrCloud environment.  The id's are a
> > pain to read, but not only am I getting different scores, I'm getting
> > different docs (or docs in a clearly different order)
> >
> > "debug": { "rawquerystring": "chiari", "querystring": "chiari", "
> > parsedquery": "text:chiari", "parsedquery_toString": "text:chiari", "
> > explain": { "d9644f86-5fe2-4a9f-8517-545e2cde0b64": "\n4.3581347 =
> > weight(text:chiari in 26783) [ClassicSimilarity], result of:\n 4.3581347
> =
> > fieldWeight in 26783, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
> 1.0
> > = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n 0.625 =
> > fieldNorm(doc=26783)\n", "1347f707-6fdd-4864-b9dd-6d3e7cc32bf5":
> "\n4.3581347
> > = weight(text:chiari in 26792) [ClassicSimilarity], result of:\n
> 4.3581347
> > = fieldWeight in 26792, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
> > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n
> 0.625 =
> > fieldNorm(doc=26792)\n", "d01c32ad-e29d-4b65-9930-f8a6844a2613":
> "\n4.3581347
> > = weight(text:chiari in 27028) [ClassicSimilarity], result of:\n
> 4.3581347
> > = fieldWeight in 27028, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
> > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n
> 0.625 =
> > fieldNorm(doc=27028)\n", "0c5a4be7-1162-4b1a-ab83-4b48a690fc3a":
> "\n4.3581347
> > = weight(text:chiari in 27029) [ClassicSimilarity], result of:\n
> 4.3581347
> > = fieldWeight in 27029, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
> > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n
> 0.625 =
> > fieldNorm(doc=27029)\n", "e1cb441d-9d60-482d-956b-3fbc964a17c1":
> "\n4.3581347
> > = weight(text:chiari in 27042) [ClassicSimilarity], result of:\n
> 4.3581347
> > = fieldWeight in 27042, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
> > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n
> 0.625 =
> > fieldNorm(doc=27042)\n", "f87951f1-e163-4f17-a628-904b9df0c609":
> "\n4.3581347
> > = weight(text:chiari in 27043) [ClassicSimilarity], result of:\n
> 4.3581347
> > = fieldWeight in 27043, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
> > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n
> 0.625 =
> > fieldNorm(doc=27043)\n", "caaa7ca1-34cb-44a8-8dd9-12c909db8c2d":
> "\n4.3581347
> > = weight(text:chiari in 27044) [ClassicSimilarity], result of:\n
> 4.3581347
> > = fieldWeight in 27044, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
> > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n
> 0.625 =
> > fieldNorm(doc=27044)\n", "ada7a87e-725a-4533-b72e-3817af4c7179":
> "\n4.3581347
> > = weight(text:chiari in 27055) [ClassicSimilarity], result of:\n
> 4.3581347
> > = fieldWeight in 27055, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
> > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n
> 0.625 =
> > fieldNorm(doc=27055)\n", "ac6d47fd-9a59-47d6-8cfb-11b34c7ded54":
> "\n4.3581347
> > = weight(text:chiari in 27056) [ClassicSimilarity], result of:\n
> 4.3581347
> > = fieldWeight in 27056, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
> > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n
> 0.625 =
> > fieldNorm(doc=27056)\n", "4aaa7697-b26a-4bea-ba4e-70d18ea649f0":
> "\n4.3581347
> > = weight(text:chiari in 62240) [ClassicSimilarity], result of:\n
> 4.3581347
> > = fieldWeight in 62240, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
> > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n
> 0.625 =
> > fieldNorm(doc=62240)\n" }, "QParser": "LuceneQParser", "timing": {
> "time":
> > 2, "prepare": { "time": 0, "query": { "time": 0 },
> >
> > ... and here's the same from the Solr Cloud 6.0 environment
> >
> > "debug":{ "rawquerystring":"chiari", "querystring":"chiari", "parsedquery
> > ":"text:chiari", "parsedquery_toString":"text:chiari", "explain":{ "
> > 85249c23-ef68-4276-9ef7-48c290033993":"\n9.735645 = weight(text:chiari in
> > 106960) [], result of:\n 9.735645 = score(doc=106960,freq=50.0 =
> > termFreq=50.0\n), product of:\n 4.798444 = idf(docFreq=281,
> > docCount=34151)\n 2.0289173 = tfNorm, computed from:\n 50.0 =
> > termFreq=50.0\n 1.2 = parameter k1\n 0.75 = parameter b\n 941.3421 =
> > avgFieldLength\n 4096.0 = fieldLength\n", "495b660d-8e8f-4b75-a523-
> > 106440468818":"\n9.655164 = weight(text:chiari in 106215) [], result
> > of:\n 9.655164 = score(doc=106215,freq=58.0 = termFreq=58.0\n), product
> > of:\n 4.798444 = idf(docFreq=281, docCount=34151)\n 2.0121448 = tfNorm,
> > computed from:\n 58.0 = termFreq=58.0\n 1.2 = parameter k1\n 0.75 =
> > parameter b\n 941.3421 = avgFieldLength\n 5349.8774 = fieldLength\n", "
> > 841df60a-b83e-4e74-9ad5-463971d5220a":"\n9.613188 = weight(text:chiari in
> > 106214) [], result of:\n 9.613188 = score(doc=106214,freq=74.0 =
> > termFreq=74.0\n), product of:\n 4.798444 = idf(docFreq=281,
> > docCount=34151)\n 2.003397 = tfNorm, computed from:\n 74.0 =
> > termFreq=74.0\n 1.2 = parameter k1\n 0.75 = parameter b\n 941.3421 =
> > avgFieldLength\n 7281.778 = fieldLength\n", "0a8ab59f-95e3-4fca-adea-
> > 5a62d97b4369":"\n9.594478 = weight(text:chiari in 106440) [], result
> > of:\n 9.594478 = score(doc=106440,freq=54.0 = termFreq=54.0\n), product
> > of:\n 4.798444 = idf(docFreq=281, docCount=34151)\n 1.9994978 = tfNorm,
> > computed from:\n 54.0 = termFreq=54.0\n 1.2 = parameter k1\n 0.75 =
> > parameter b\n 941.3421 = avgFieldLength\n 5349.8774 = fieldLength\n", "
> > 15595a34-88c4-42e0-a6b2-9ee8eafdd9e8":"\n9.502294 = weight(text:chiari in
> > 106958) [], result of:\n 9.502294 = score(doc=106958,freq=38.0 =
> > termFreq=38.0\n), product of:\n 4.798444 = idf(docFreq=281,
> > docCount=34151)\n 1.9802866 = tfNorm, computed from:\n 38.0 =
> > termFreq=38.0\n 1.2 = parameter k1\n 0.75 = parameter b\n 941.3421 =
> > avgFieldLength\n 4096.0 = fieldLength\n", "0acd1f88-395c-434d-9cba-
> > 919e7073080c":"\n9.449741 = weight(text:chiari in 106439) [], result
> > of:\n 9.449741 = score(doc=106439,freq=62.0 = termFreq=62.0\n), product
> > of:\n 4.798444 = idf(docFreq=281, docCount=34151)\n 1.9693346 = tfNorm,
> > computed from:\n 62.0 = termFreq=62.0\n 1.2 = parameter k1\n 0.75 =
> > parameter b\n 941.3421 = avgFieldLength\n 7281.778 = fieldLength\n", "
> > 66516297-cf1d-4ee8-847b-a5193420491a":"\n9.284438 = weight(text:chiari in
> > 108786) [], result of:\n 9.284438 = score(doc=108786,freq=53.0 =
> > termFreq=53.0\n), product of:\n 4.798444 = idf(docFreq=281,
> > docCount=34151)\n 1.9348853 = tfNorm, computed from:\n 53.0 =
> > termFreq=53.0\n 1.2 = parameter k1\n 0.75 = parameter b\n 941.3421 =
> > avgFieldLength\n 7281.778 = fieldLength\n", "0c5a4be7-1162-4b1a-ab83-
> > 4b48a690fc3a":"\n9.164393 = weight(text:chiari in 6100) [], result of:\n
> > 9.164393 = score(doc=6100,freq=2.0 = termFreq=2.0\n), product of:\n
> > 4.798444 = idf(docFreq=281, docCount=34151)\n 1.9098678 = tfNorm,
> computed
> > from:\n 2.0 = termFreq=2.0\n 1.2 = parameter k1\n 0.75 = parameter b\n
> > 941.3421 = avgFieldLength\n 4.0 = fieldLength\n", "
> > e1cb441d-9d60-482d-956b-3fbc964a17c1":"\n9.164393 = weight(text:chiari in
> > 6113) [], result of:\n 9.164393 = score(doc=6113,freq=2.0 =
> > termFreq=2.0\n), product of:\n 4.798444 = idf(docFreq=281,
> > docCount=34151)\n 1.9098678 = tfNorm, computed from:\n 2.0 =
> termFreq=2.0\n
> > 1.2 = parameter k1\n 0.75 = parameter b\n 941.3421 = avgFieldLength\n
> 4.0 =
> > fieldLength\n", "f87951f1-e163-4f17-a628-904b9df0c609":"\n9.164393 =
> > weight(text:chiari in 6114) [], result of:\n 9.164393 =
> > score(doc=6114,freq=2.0 = termFreq=2.0\n), product of:\n 4.798444 =
> > idf(docFreq=281, docCount=34151)\n 1.9098678 = tfNorm, computed from:\n
> 2.0
> > = termFreq=2.0\n 1.2 = parameter k1\n 0.75 = parameter b\n 941.3421 =
> > avgFieldLength\n 4.0 = fieldLength\n"}, "QParser":"LuceneQParser",
> "timing
> > ":{ "time":1.0,
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message