lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Soby Thomas <soby.thoma...@gmail.com>
Subject Re: Need help in understanding output of searcher.explain() function
Date Sat, 07 Aug 2010 18:38:34 GMT
thanks Jayendra...it was really helpful

On Sat, Aug 7, 2010 at 6:07 PM, jayendra patil <jayendra.patil@gmail.com>wrote:

> Trying to put up an explanation :-
>
> 0.022172567 = (MATCH) product of:
>  0.07760398 = (MATCH) sum of:
>   0.02287053 = (MATCH) weight(payload:ces in 550), product of:
>     0.32539415 = queryWeight(payload:ces), product of:
>       2.2491398 = *idf*(docFreq=157, maxDocs=551)
>       0.14467494 = queryNorm
>     0.07028562 = (MATCH) fieldWeight(payload:ces in 550), product of:
>       1.0 = *tf(*termFreq(payload:ces)=1)
>       2.2491398 = *idf(*docFreq=157, maxDocs=551)
>       0.03125 = *fieldNorm*(field=payload, doc=550)
>   0.05473345 = (MATCH) weight(payload:deal in 550), product of:
>     0.23803486 = queryWeight(payload:deal), product of:
>       1.6453081 = *idf(*docFreq=288, maxDocs=551)
>       0.14467494 = *queryNorm*
>     0.2299388 = (MATCH) fieldWeight(payload:deal in 550), product of:
>       4.472136 = tf(termFreq(payload:deal)=20)
>       1.6453081 = idf(docFreq=288, maxDocs=551)
>       0.03125 = fieldNorm(field=payload, doc=550)
>  0.2857143 = coord(2/7)
>
>
>    1. tf = term frequency in document = measure of how often a term appears
>   in the document
>      1.
>
>      Implementation: sqrt(freq)
>
>      Implication: the more frequent a term occurs in a document, the
>      greater its score
>
>      Rationale: documents which contains more of a term are generally more
>      relevant
>      2. idf = inverse document frequency = measure of how often the term
>   appears across the index
>      1.
>
>      Implementation: log(numDocs/(docFreq+1)) + 1
>
>      Implication: the greater the occurrence of a term in different
>      documents, the lower its score
>
>      Rationale: common terms are less important than uncommon ones
>      3. coord = number of terms in the query that were found in the
>   document
>      1.
>
>      Implementation: overlap / maxOverlap
>
>      Implication: of the terms in the query, a document that contains more
>      terms will have a higher score
>
>      Rationale: self-explanatory
>      4. fieldNorm
>      1. lengthNorm = measure of the importance of a term according to the
>      total number of terms in the field
>         1. Implementation: 1/sqrt(numTerms)
>         2. Implication: a term matched in fields with less terms have a
>         higher score
>         3. Rationale: a term in a field with less terms is more important
>         than one with more
>      2. boost (index) = boost of the field at index-time
>         1. Index time boost specified. The fieldNorm value in the score
>            would include the same.
>         3. boost (query) = boost of the field at query-time
>   5. queryNorm = normalization factor so that queries can be compared
>      1. queryNorm is not related to the relevance of the document, but
>      rather tries to make scores between different queries comparable. It
> is
>      implemented as 1/sqrt(sumOfSquaredWeights)
>
>
> When you are trying to search for Query: *It is definitely a CES deal that
> will be over in Sep or Oct of this year.*
>
> 1. Lucene would try to match each word in our query in each field that you
> have specified to be searched on e.g. payload in your case.
> 2. In your example, it found match only on ces and deal, hence only the two
> items are displayed.
> 3. The number of matches in the particular field also contributes to
> the 0.2857143 = coord(*2*/7) - 2 words out of 7
> 4. *idf*(docFreq=157, maxDocs=551) - specified the rarity. The docfreq
> specifies the number of documents which have the word in the field with the
> maxdocs represents the total number of documents.
> 5. *tf(*termFreq(payload:ces)=1) - Specifies the number of times it occurs
> e.g. 1 in this case.
> 6. The Score for each field match is the product of the
>
> 0.02287053 = (MATCH) weight(payload:ces in 550), product of:
>
>                Field boost and idf
>
> 0.32539415 = queryWeight(payload:ces), product of:
>
> *      1 = boost (**The boost if your case seems to be 1 and hence is not
> included in the score.**)*
>
>       2.2491398 = idf(docFreq=157, maxDocs=551)
>
>       0.14467494 = queryNorm
>
>                term frequency, idf and field norm
>
> 0.07028562 = (MATCH) fieldWeight(payload:ces in 550), product of:
>
>       1.0 = *tf(*termFreq(payload:ces)=1)
>
>       2.2491398 = *idf(*docFreq=157, maxDocs=551)
>
>       0.03125 = *fieldNorm*(field=payload, doc=550)
>
>
>
> Regards,
> Jayendra
>
> On Sat, Aug 7, 2010 at 11:02 AM, Soby Thomas <soby.thomas85@gmail.com
> >wrote:
>
> > Hello Guys,
> >
> > I trying to understand how lucene score is calculated. So 'm using the
> > searcher.explain() function. But the output it gives is really confusing
> > for
> > me. Below are the details of the query that I gave and o/p it gave me
> >
> > Query: *It is definitely a CES deal that will be over in Sep or Oct of
> this
> > year.*
> >
> > *output*:
> >  0.022172567 = (MATCH) product of:
> >  0.07760398 = (MATCH) sum of:
> >    0.02287053 = (MATCH) weight(payload:ces in 550), product of:
> >      0.32539415 = queryWeight(payload:ces), product of:
> >        2.2491398 = idf(docFreq=157, maxDocs=551)
> >        0.14467494 = queryNorm
> >      0.07028562 = (MATCH) fieldWeight(payload:ces in 550), product of:
> >        1.0 = tf(termFreq(payload:ces)=1)
> >        2.2491398 = idf(docFreq=157, maxDocs=551)
> >        0.03125 = fieldNorm(field=payload, doc=550)
> >    0.05473345 = (MATCH) weight(payload:deal in 550), product of:
> >      0.23803486 = queryWeight(payload:deal), product of:
> >        1.6453081 = idf(docFreq=288, maxDocs=551)
> >        0.14467494 = queryNorm
> >      0.2299388 = (MATCH) fieldWeight(payload:deal in 550), product of:
> >        4.472136 = tf(termFreq(payload:deal)=20)
> >        1.6453081 = idf(docFreq=288, maxDocs=551)
> >        0.03125 = fieldNorm(field=payload, doc=550)
> >  0.2857143 = coord(2/7)
> >
> > So can someone please help me to understand the output or suggest any
> link
> > that explains this output so that I will be grateful.
> >
> > Regards
> > Soby
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message