Trying to put up an explanation :
0.022172567 = (MATCH) product of:
0.07760398 = (MATCH) sum of:
0.02287053 = (MATCH) weight(payload:ces in 550), product of:
0.32539415 = queryWeight(payload:ces), product of:
2.2491398 = *idf*(docFreq=157, maxDocs=551)
0.14467494 = queryNorm
0.07028562 = (MATCH) fieldWeight(payload:ces in 550), product of:
1.0 = *tf(*termFreq(payload:ces)=1)
2.2491398 = *idf(*docFreq=157, maxDocs=551)
0.03125 = *fieldNorm*(field=payload, doc=550)
0.05473345 = (MATCH) weight(payload:deal in 550), product of:
0.23803486 = queryWeight(payload:deal), product of:
1.6453081 = *idf(*docFreq=288, maxDocs=551)
0.14467494 = *queryNorm*
0.2299388 = (MATCH) fieldWeight(payload:deal in 550), product of:
4.472136 = tf(termFreq(payload:deal)=20)
1.6453081 = idf(docFreq=288, maxDocs=551)
0.03125 = fieldNorm(field=payload, doc=550)
0.2857143 = coord(2/7)
1. tf = term frequency in document = measure of how often a term appears
in the document
1.
Implementation: sqrt(freq)
Implication: the more frequent a term occurs in a document, the
greater its score
Rationale: documents which contains more of a term are generally more
relevant
2. idf = inverse document frequency = measure of how often the term
appears across the index
1.
Implementation: log(numDocs/(docFreq+1)) + 1
Implication: the greater the occurrence of a term in different
documents, the lower its score
Rationale: common terms are less important than uncommon ones
3. coord = number of terms in the query that were found in the
document
1.
Implementation: overlap / maxOverlap
Implication: of the terms in the query, a document that contains more
terms will have a higher score
Rationale: selfexplanatory
4. fieldNorm
1. lengthNorm = measure of the importance of a term according to the
total number of terms in the field
1. Implementation: 1/sqrt(numTerms)
2. Implication: a term matched in fields with less terms have a
higher score
3. Rationale: a term in a field with less terms is more important
than one with more
2. boost (index) = boost of the field at indextime
1. Index time boost specified. The fieldNorm value in the score
would include the same.
3. boost (query) = boost of the field at querytime
5. queryNorm = normalization factor so that queries can be compared
1. queryNorm is not related to the relevance of the document, but
rather tries to make scores between different queries comparable. It is
implemented as 1/sqrt(sumOfSquaredWeights)
When you are trying to search for Query: *It is definitely a CES deal that
will be over in Sep or Oct of this year.*
1. Lucene would try to match each word in our query in each field that you
have specified to be searched on e.g. payload in your case.
2. In your example, it found match only on ces and deal, hence only the two
items are displayed.
3. The number of matches in the particular field also contributes to
the 0.2857143 = coord(*2*/7)  2 words out of 7
4. *idf*(docFreq=157, maxDocs=551)  specified the rarity. The docfreq
specifies the number of documents which have the word in the field with the
maxdocs represents the total number of documents.
5. *tf(*termFreq(payload:ces)=1)  Specifies the number of times it occurs
e.g. 1 in this case.
6. The Score for each field match is the product of the
0.02287053 = (MATCH) weight(payload:ces in 550), product of:
Field boost and idf
0.32539415 = queryWeight(payload:ces), product of:
* 1 = boost (**The boost if your case seems to be 1 and hence is not
included in the score.**)*
2.2491398 = idf(docFreq=157, maxDocs=551)
0.14467494 = queryNorm
term frequency, idf and field norm
0.07028562 = (MATCH) fieldWeight(payload:ces in 550), product of:
1.0 = *tf(*termFreq(payload:ces)=1)
2.2491398 = *idf(*docFreq=157, maxDocs=551)
0.03125 = *fieldNorm*(field=payload, doc=550)
Regards,
Jayendra
On Sat, Aug 7, 2010 at 11:02 AM, Soby Thomas <soby.thomas85@gmail.com>wrote:
> Hello Guys,
>
> I trying to understand how lucene score is calculated. So 'm using the
> searcher.explain() function. But the output it gives is really confusing
> for
> me. Below are the details of the query that I gave and o/p it gave me
>
> Query: *It is definitely a CES deal that will be over in Sep or Oct of this
> year.*
>
> *output*:
> 0.022172567 = (MATCH) product of:
> 0.07760398 = (MATCH) sum of:
> 0.02287053 = (MATCH) weight(payload:ces in 550), product of:
> 0.32539415 = queryWeight(payload:ces), product of:
> 2.2491398 = idf(docFreq=157, maxDocs=551)
> 0.14467494 = queryNorm
> 0.07028562 = (MATCH) fieldWeight(payload:ces in 550), product of:
> 1.0 = tf(termFreq(payload:ces)=1)
> 2.2491398 = idf(docFreq=157, maxDocs=551)
> 0.03125 = fieldNorm(field=payload, doc=550)
> 0.05473345 = (MATCH) weight(payload:deal in 550), product of:
> 0.23803486 = queryWeight(payload:deal), product of:
> 1.6453081 = idf(docFreq=288, maxDocs=551)
> 0.14467494 = queryNorm
> 0.2299388 = (MATCH) fieldWeight(payload:deal in 550), product of:
> 4.472136 = tf(termFreq(payload:deal)=20)
> 1.6453081 = idf(docFreq=288, maxDocs=551)
> 0.03125 = fieldNorm(field=payload, doc=550)
> 0.2857143 = coord(2/7)
>
> So can someone please help me to understand the output or suggest any link
> that explains this output so that I will be grateful.
>
> Regards
> Soby
>
