thanks Jayendra...it was really helpful
On Sat, Aug 7, 2010 at 6:07 PM, jayendra patil wrote:
> Trying to put up an explanation :-
>
> 0.022172567 = (MATCH) product of:
> 0.07760398 = (MATCH) sum of:
> 0.02287053 = (MATCH) weight(payload:ces in 550), product of:
> 0.32539415 = queryWeight(payload:ces), product of:
> 2.2491398 = *idf*(docFreq=157, maxDocs=551)
> 0.14467494 = queryNorm
> 0.07028562 = (MATCH) fieldWeight(payload:ces in 550), product of:
> 1.0 = *tf(*termFreq(payload:ces)=1)
> 2.2491398 = *idf(*docFreq=157, maxDocs=551)
> 0.03125 = *fieldNorm*(field=payload, doc=550)
> 0.05473345 = (MATCH) weight(payload:deal in 550), product of:
> 0.23803486 = queryWeight(payload:deal), product of:
> 1.6453081 = *idf(*docFreq=288, maxDocs=551)
> 0.14467494 = *queryNorm*
> 0.2299388 = (MATCH) fieldWeight(payload:deal in 550), product of:
> 4.472136 = tf(termFreq(payload:deal)=20)
> 1.6453081 = idf(docFreq=288, maxDocs=551)
> 0.03125 = fieldNorm(field=payload, doc=550)
> 0.2857143 = coord(2/7)
>
>
> 1. tf = term frequency in document = measure of how often a term appears
> in the document
> 1.
>
> Implementation: sqrt(freq)
>
> Implication: the more frequent a term occurs in a document, the
> greater its score
>
> Rationale: documents which contains more of a term are generally more
> relevant
> 2. idf = inverse document frequency = measure of how often the term
> appears across the index
> 1.
>
> Implementation: log(numDocs/(docFreq+1)) + 1
>
> Implication: the greater the occurrence of a term in different
> documents, the lower its score
>
> Rationale: common terms are less important than uncommon ones
> 3. coord = number of terms in the query that were found in the
> document
> 1.
>
> Implementation: overlap / maxOverlap
>
> Implication: of the terms in the query, a document that contains more
> terms will have a higher score
>
> Rationale: self-explanatory
> 4. fieldNorm
> 1. lengthNorm = measure of the importance of a term according to the
> total number of terms in the field
> 1. Implementation: 1/sqrt(numTerms)
> 2. Implication: a term matched in fields with less terms have a
> higher score
> 3. Rationale: a term in a field with less terms is more important
> than one with more
> 2. boost (index) = boost of the field at index-time
> 1. Index time boost specified. The fieldNorm value in the score
> would include the same.
> 3. boost (query) = boost of the field at query-time
> 5. queryNorm = normalization factor so that queries can be compared
> 1. queryNorm is not related to the relevance of the document, but
> rather tries to make scores between different queries comparable. It
> is
> implemented as 1/sqrt(sumOfSquaredWeights)
>
>
> When you are trying to search for Query: *It is definitely a CES deal that
> will be over in Sep or Oct of this year.*
>
> 1. Lucene would try to match each word in our query in each field that you
> have specified to be searched on e.g. payload in your case.
> 2. In your example, it found match only on ces and deal, hence only the two
> items are displayed.
> 3. The number of matches in the particular field also contributes to
> the 0.2857143 = coord(*2*/7) - 2 words out of 7
> 4. *idf*(docFreq=157, maxDocs=551) - specified the rarity. The docfreq
> specifies the number of documents which have the word in the field with the
> maxdocs represents the total number of documents.
> 5. *tf(*termFreq(payload:ces)=1) - Specifies the number of times it occurs
> e.g. 1 in this case.
> 6. The Score for each field match is the product of the
>
> 0.02287053 = (MATCH) weight(payload:ces in 550), product of:
>
> Field boost and idf
>
> 0.32539415 = queryWeight(payload:ces), product of:
>
> * 1 = boost (**The boost if your case seems to be 1 and hence is not
> included in the score.**)*
>
> 2.2491398 = idf(docFreq=157, maxDocs=551)
>
> 0.14467494 = queryNorm
>
> term frequency, idf and field norm
>
> 0.07028562 = (MATCH) fieldWeight(payload:ces in 550), product of:
>
> 1.0 = *tf(*termFreq(payload:ces)=1)
>
> 2.2491398 = *idf(*docFreq=157, maxDocs=551)
>
> 0.03125 = *fieldNorm*(field=payload, doc=550)
>
>
>
> Regards,
> Jayendra
>
> On Sat, Aug 7, 2010 at 11:02 AM, Soby Thomas >wrote:
>
> > Hello Guys,
> >
> > I trying to understand how lucene score is calculated. So 'm using the
> > searcher.explain() function. But the output it gives is really confusing
> > for
> > me. Below are the details of the query that I gave and o/p it gave me
> >
> > Query: *It is definitely a CES deal that will be over in Sep or Oct of
> this
> > year.*
> >
> > *output*:
> > 0.022172567 = (MATCH) product of:
> > 0.07760398 = (MATCH) sum of:
> > 0.02287053 = (MATCH) weight(payload:ces in 550), product of:
> > 0.32539415 = queryWeight(payload:ces), product of:
> > 2.2491398 = idf(docFreq=157, maxDocs=551)
> > 0.14467494 = queryNorm
> > 0.07028562 = (MATCH) fieldWeight(payload:ces in 550), product of:
> > 1.0 = tf(termFreq(payload:ces)=1)
> > 2.2491398 = idf(docFreq=157, maxDocs=551)
> > 0.03125 = fieldNorm(field=payload, doc=550)
> > 0.05473345 = (MATCH) weight(payload:deal in 550), product of:
> > 0.23803486 = queryWeight(payload:deal), product of:
> > 1.6453081 = idf(docFreq=288, maxDocs=551)
> > 0.14467494 = queryNorm
> > 0.2299388 = (MATCH) fieldWeight(payload:deal in 550), product of:
> > 4.472136 = tf(termFreq(payload:deal)=20)
> > 1.6453081 = idf(docFreq=288, maxDocs=551)
> > 0.03125 = fieldNorm(field=payload, doc=550)
> > 0.2857143 = coord(2/7)
> >
> > So can someone please help me to understand the output or suggest any
> link
> > that explains this output so that I will be grateful.
> >
> > Regards
> > Soby
> >
>