lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject clarification on booleanScorer
Date Mon, 02 Jul 2007 23:38:16 GMT
Hi all,
I would like to clarify my understanding of the way Lucene score boolean queries, in relation
with +/  clause attributes (required and optional) as well as OR and AND operators. 
After looking at the BooleanScorer source core, the following is my understanding on the scoring:
1. OR is translated into " " (optional) and AND is translated into "+" (required) by queryParser
 so, is it true that 
(t1 t2 t3) AND (t4 t5 t6)  OR  (t7 t8 t9)  is parsed by queryParser into the following boolan
+(t1 t2 t3) +(t4 t5 t6) (t7 t8 t9)
2. using default similarity, a score of a document score(q,d) is the summation of the tf,
idf measure of the terms in q that appear in d. 
3. Score of a document w.r.t BooleanClause, BC (score(BC,d)) is the sum of score of the document
w.r.t sall sub clauses of BC.
4. no difference in treating "+" clauses and " " clauses in scoring (i.e. their scorer.score()
are summed up together to produce the total score of their parent' score), however, the addition
of the scores of " " clauses are delayed until all "+" are matched by the documents. If not
all "+" mare matched, the document is not retrieved.
         -----C1-----    ----C2-----    ----C3-----      ------C4-------
q = +{+(t1 t2 t3)   +(t4 t5 t6)   (t7 t8 t9)}     {t10 t11 t12}
assuming a document,d  match C1 and C2, the s(q,d) = sum(sum(s(C1,d) + s(C2,d) + s(C3, d)),
Please let me know whether the above are true. In case there are something I miss to understand
the scoring of booleanScorer, please let me know.
best regards

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message