lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shawn Heisey (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-5929) Solrj QueryResponse results not presented in proper score order
Date Fri, 28 Mar 2014 14:49:18 GMT

    [ https://issues.apache.org/jira/browse/SOLR-5929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950802#comment-13950802
] 

Shawn Heisey commented on SOLR-5929:
------------------------------------

Problems should be raised on the mailing list first.  We try to confirm bugs before opening
issues in JIRA.  The Solr "discussion" page mentions this.

http://lucene.apache.org/solr/discussion.html

SolrJ just returns results in the order that Solr delivers them.  It does not do any re-ordering.

The first response (formatted json) doesn't list a sort parameter.  The second one (toString
from SolrJ) does list a sort parameter.  The requests are different, which is why the results
are different.

The source of your confusion here appears to be the document scores.  Your first search does
not have a sort field.  It uses the score.  The second has a sort field, so the score is ignored,
and the sort parameter is the only thing that is used.  The debug will still calculate the
score, even though it doesn't matter.


> Solrj QueryResponse results not presented in proper score order
> ---------------------------------------------------------------
>
>                 Key: SOLR-5929
>                 URL: https://issues.apache.org/jira/browse/SOLR-5929
>             Project: Solr
>          Issue Type: Bug
>          Components: clients - java
>    Affects Versions: 4.6.1
>         Environment: Windows 7, Java 7
>            Reporter: Chris Pilsworth
>
> It would appear that the results collection is sorting on the score as a string where
there is an exponent.
> When searching for a term that returns two documents, one with a significantly smaller
score than the other the results are returned *correctly* from solr directly.
> {code:json}
> {
>     responseHeader: {
>         status: 0,
>         QTime: 69,
>         params: {
>             q: "sausages",
>             indent: "true",
>             fl: "id, inv_text_summary, score",
>             wt: "json",
>             debugQuery: "true"
>         }
>     },
>     response: {
>         numFound: 2,
>         start: 0,
>         maxScore: 0.0012368863,
>         docs: [
>             {
>                 inv_text_summary: "Contrary to popular belief, Lorem sausages sausages
sausages sausagesIpsum is not simply random text. It has roots in a piece of classical Latin
literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor
at Hampden-Sydney College in Virginia, looked up one of the ...",
>                 id: "/content/site-qa/capital/en_gb/home/test_pages/test_page_2/multiterm2",
>                 score: 0.0012368863
>             },
>             {
>                 inv_text_summary: "Contrary to sausages belief, Lorem Ipsum is not simply
random text. It has roots in a piece of classical Latin literature from 45 BC, making it over
2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia,
looked up one of the more obscure Latin words, consecte...",
>                 id: "/content/site-qa/capital/en_gb/home/test_pages/test_page_2/multiterm1",
>                 score: 0.0006184431
>             }
>         ]
>     },
>     debug: {
>         rawquerystring: "sausages",
>         querystring: "sausages",
>         parsedquery: "(+DisjunctionMaxQuery((inv_path:sausages | inv_h1:sausages^8.0
| inv_text_summary:sausages^2.0 | inv_title:sausages^18.0 | inv_h2:sausages^6.0 | inv_h3:sausages^4.0
| inv_text:sausages)~1.0) () () () () () () ())/no_coord",
>         parsedquery_toString: "+(inv_path:sausages | inv_h1:sausages^8.0 | inv_text_summary:sausages^2.0
| inv_title:sausages^18.0 | inv_h2:sausages^6.0 | inv_h3:sausages^4.0 | inv_text:sausages)~1.0
() () () () () () ()",
>         explain: {
>             /content/site-qa/capital/en_gb/home/test_pages/test_page_2/multiterm2: "
0.0012368863 = (MATCH) sum of: 0.0012368863 = (MATCH) max plus 1.0 times others of: 0.0012368863
= (MATCH) weight(inv_text:sausages in 0) [DefaultSimilarity], result of: 0.0012368863 = score(doc=0,freq=4.0
= termFreq=4.0 ), product of: 0.016643414 = queryWeight, product of: 0.5945349 = idf(docFreq=2,
maxDocs=2) 0.027994009 = queryNorm 0.07431686 = fieldWeight in 0, product of: 2.0 = tf(freq=4.0),
with freq of: 4.0 = termFreq=4.0 0.5945349 = idf(docFreq=2, maxDocs=2) 0.0625 = fieldNorm(doc=0)
",
>             /content/site-qa/capital/en_gb/home/test_pages/test_page_2/multiterm1: "
6.184431E-4 = (MATCH) sum of: 6.184431E-4 = (MATCH) max plus 1.0 times others of: 6.184431E-4
= (MATCH) weight(inv_text:sausages in 0) [DefaultSimilarity], result of: 6.184431E-4 = score(doc=0,freq=1.0
= termFreq=1.0 ), product of: 0.016643414 = queryWeight, product of: 0.5945349 = idf(docFreq=2,
maxDocs=2) 0.027994009 = queryNorm 0.03715843 = fieldWeight in 0, product of: 1.0 = tf(freq=1.0),
with freq of: 1.0 = termFreq=1.0 0.5945349 = idf(docFreq=2, maxDocs=2) 0.0625 = fieldNorm(doc=0)
"
>         },
>         QParser: "ExtendedDismaxQParser",
>         altquerystring: null,
>         boost_queries: null,
>         parsed_boost_queries: [
>             
>         ],
>         boostfuncs: null,
>         timing: {
>             time: 69,
>             prepare: {
>                 time: 14,
>                 query: {
>                     time: 14
>                 },
>                 facet: {
>                     time: 0
>                 },
>                 mlt: {
>                     time: 0
>                 },
>                 highlight: {
>                     time: 0
>                 },
>                 stats: {
>                     time: 0
>                 },
>                 debug: {
>                     time: 0
>                 }
>             },
>             process: {
>                 time: 55,
>                 query: {
>                     time: 0
>                 },
>                 facet: {
>                     time: 0
>                 },
>                 mlt: {
>                     time: 0
>                 },
>                 highlight: {
>                     time: 0
>                 },
>                 stats: {
>                     time: 0
>                 },
>                 debug: {
>                     time: 55
>                 }
>             }
>         }
>     }
> }
> {code}
> However, when read through the solrj client, the document with the lower score is presented
first.  This could be as the lower score (0.0006184431) is expressed as 6.184431E-4 so if
a string sort was applied, could explain the order.
> The QueryResponse.toString looks like this.
> {code}
> [responseHeader=[status=0,QTime=2156,params=[q=sausages,hl.useFastVectorHighlighter=true,facet.field=[inv_siteContentType,
inv_siteCategory],qt=/selectmin,hl=true,start=0,sort=inv_desktopPageTitleSort asc,id asc,rows=5,facet=true,wt=javabin,version=2,debugQuery=true]],response=[numFound=2,start=0,maxScore=0.0012368863,docs=[SolrDocument[inv_text_summary=Contrary
to sausages belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical
Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor
at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consecte...,
inv_title=[multiterm1 - Investec Specialist Bank], inv_customContentTypes=[#1:COMMON#2:#3:#4:#5:[]#6:#7:[]#8:[]],
id=/content/site-qa/capital/en_gb/home/test_pages/test_page_2/multiterm1, inv_created=Thu
Mar 27 12:54:34 CET 2014, inv_contentProcessed=Thu Mar 27 12:59:04 CET 2014, inv_desktopPageTitle=multiterm1,
inv_mobilePageTitle=multiterm1, inv_lastModified=Thu Mar 27 12:57:48 CET 2014, score=6.184431E-4],
SolrDocument[inv_text_summary=Contrary to popular belief, Lorem sausages sausages sausages sausagesIpsum
is not simply random text. It has roots in a piece of classical Latin literature from 45 BC,
making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College
in Virginia, looked up one of the ..., inv_title=[multiterm2 - Investec Specialist Bank],
inv_customContentTypes=[#1:COMMON#2:#3:#4:#5:[]#6:#7:[]#8:[]], id=/content/site-qa/capital/en_gb/home/test_pages/test_page_2/multiterm2,
inv_created=Thu Mar 27 12:54:42 CET 2014, inv_contentProcessed=Thu Mar 27 13:03:31 CET 2014,
inv_desktopPageTitle=multiterm2, inv_mobilePageTitle=multiterm2, inv_lastModified=Thu Mar
27 13:03:27 CET 2014, score=0.0012368863]]],facet_counts=[facet_queries=[],facet_fields=[inv_siteContentType=[NONE=2],inv_siteCategory=[NONE=2]],facet_dates=[],facet_ranges=[]],highlighting=[/content/site-qa/capital/en_gb/home/test_pages/test_page_2/multiterm1=[inv_text=[Contrary
to <b>sausages</b> belief, Lorem Ipsum is not simply random text. It has roots
in a piece of classical]],/content/site-qa/capital/en_gb/home/test_pages/test_page_2/multiterm2=[inv_text=[Contrary
to popular belief, Lorem <b>sausages</b> <b>sausages</b> <b>sausages</b> <b>sausages</b>Ipsum
is not simply random text. It]]],debug=[rawquerystring=sausages,querystring=sausages,parsedquery=(+DisjunctionMaxQuery((inv_path:sausages
| inv_h1:sausages^8.0 | inv_text_summary:sausages^2.0 | inv_title:sausages^18.0 | inv_h2:sausages^6.0
| inv_h3:sausages^4.0 | inv_text:sausages)~1.0) () () () () () () ())/no_coord,parsedquery_toString=+(inv_path:sausages
| inv_h1:sausages^8.0 | inv_text_summary:sausages^2.0 | inv_title:sausages^18.0 | inv_h2:sausages^6.0
| inv_h3:sausages^4.0 | inv_text:sausages)~1.0 () () () () () () (),explain=
> [/content/site-qa/capital/en_gb/home/test_pages/test_page_2/multiterm1=
> 6.184431E-4 = (MATCH) sum of:
>   6.184431E-4 = (MATCH) max plus 1.0 times others of:
>     6.184431E-4 = (MATCH) weight(inv_text:sausages in 0) [DefaultSimilarity], result
of:
>       6.184431E-4 = score(doc=0,freq=1.0 = termFreq=1.0
> ), product of:
>         0.016643414 = queryWeight, product of:
>           0.5945349 = idf(docFreq=2, maxDocs=2)
>           0.027994009 = queryNorm
>         0.03715843 = fieldWeight in 0, product of:
>           1.0 = tf(freq=1.0), with freq of:
>             1.0 = termFreq=1.0
>           0.5945349 = idf(docFreq=2, maxDocs=2)
>           0.0625 = fieldNorm(doc=0)
> ,/content/site-qa/capital/en_gb/home/test_pages/test_page_2/multiterm2=
> 0.0012368863 = (MATCH) sum of:
>   0.0012368863 = (MATCH) max plus 1.0 times others of:
>     0.0012368863 = (MATCH) weight(inv_text:sausages in 0) [DefaultSimilarity], result
of:
>       0.0012368863 = score(doc=0,freq=4.0 = termFreq=4.0
> ), product of:
>         0.016643414 = queryWeight, product of:
>           0.5945349 = idf(docFreq=2, maxDocs=2)
>           0.027994009 = queryNorm
>         0.07431686 = fieldWeight in 0, product of:
>           2.0 = tf(freq=4.0), with freq of:
>             4.0 = termFreq=4.0
>           0.5945349 = idf(docFreq=2, maxDocs=2)
>           0.0625 = fieldNorm(doc=0)
> ]
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message