lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koji Sekiguchi <k...@r.email.ne.jp>
Subject Re: Understanding the Debug explanations for Query Result Scoring/Ranking
Date Thu, 24 Jul 2014 23:36:47 GMT
Hi,

In addition, this might be useful:

Fundamentals of Information Retrieval, Illustration with Apache Lucene
https://www.youtube.com/watch?v=SCsS5ePGmCs

This video is about 40 minutes long, but you can fast forward to 24:00
to learn scoring based on vector space model and how Lucene customize it.

Koji
-- 
http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html

(2014/07/25 8:00), Uwe Reh wrote:
> Hi,
>
> to get an idea of the meaning of all this numbers, have a look on http://explain.solr.pl.
I like
> this tool, it's great.
>
> Uwe
>
> Am 25.07.2014 00:45, schrieb O. Olson:
>> Hi,
>>
>>     If you add /*&debug=true*/ to the Solr request /(and &wt=xml if your
>> current output is not XML)/, you would get a node in the resulting XML that
>> is named "debug". There is a child node to this called "explain" to this
>> which has a list showing why the results are ranked in a particular order.
>> I'm curious if there is some documentation on understanding these
>> numbers/results.
>>
>>     I am new to Solr, so I apologize that I may be using the wrong terms to
>> describe my problem. I also aware of
>> http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
>> though I have not completely understood it.
>>
>>     My problem is trying to understand something like this:
>>
>> 1.5797625 = (MATCH) sum of: 0.4717142 = (MATCH) weight(text:televis in
>> 44109) [DefaultSimilarity], result of: 0.4717142 = score(doc=44109,freq=1.0
>> = termFreq=1.0 ), product of: 0.71447384 = queryWeight, product of:
>> 7.0424104 = idf(docFreq=896, maxDocs=377553) 0.10145303 = queryNorm 0.660226
>> = fieldWeight in 44109, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 =
>> termFreq=1.0 7.0424104 = idf(docFreq=896, maxDocs=377553) 0.09375 =
>> fieldNorm(doc=44109) 1.1080483 = (MATCH) weight(text:tv in 44109)
>> [DefaultSimilarity], result of: 1.1080483 = score(doc=44109,freq=6.0 =
>> termFreq=6.0 ), product of: 0.6996622 = queryWeight, product of: 6.896415 =
>> idf(docFreq=1037, maxDocs=377553) 0.10145303 = queryNorm 1.5836904 =
>> fieldWeight in 44109, product of: 2.4494898 = tf(freq=6.0), with freq of:
>> 6.0 = termFreq=6.0 6.896415 = idf(docFreq=1037, maxDocs=377553) 0.09375 =
>> fieldNorm(doc=44109)
>>
>> *Note:* I have searched for "televisions". My search field is a single
>> catch-all field. The Edismax parser seems to break up my search term into
>> "televis" and "tv"
>>
>> Is there some documentation on how to understand these numbers. They do not
>> seem to be properly delimited. At the minimum, I can understand something
>> like:
>> 1.5797625 =  0.4717142 + 1.1080483
>> and
>> 0.71447384  = 7.0424104 * 0.10145303
>>
>> But, I cannot understand if something like "0.10145303 = queryNorm 0.660226
>> = fieldWeight in 44109" is used in the calculation anywhere. Also since
>> there were only two terms /("televis" and "tv")/ I could use subtraction to
>> find out 1.1080483 was the start of a new result.
>>
>> I'd also appreciate if someone can tell me which class dumps out the above
>> data. If I know it, I can edit that class to make the output a bit more
>> understandable for me.
>>
>> Thank you,
>> O. O.
>>
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Understanding-the-Debug-explanations-for-Query-Result-Scoring-Ranking-tp4149137.html
>>
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>




Mime
View raw message