lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Allan Hill <p...@metajure.com>
Subject Please explain DisjunctionMaxQuery JavaDoc.
Date Wed, 08 Feb 2012 22:42:11 GMT
What the heck does is the JavaDoc for DisjunctionMaxQuery saying:

"A query that generates the union of documents produced by its subqueries, and that scores
each document with the maximum score for that document as produced by any subquery, plus a
tie breaking increment for any additional matching subqueries. This is useful when searching
for a word in multiple fields with different boost factors (so that the fields cannot be combined
equivalently into a single search field). We want the primary score to be the one associated
with the highest boost, not the sum of the field scores (as BooleanQuery would give). If the
query is "albino elephant" this ensures that "albino" matching one field and "elephant" matching
another gets a higher score than "albino" matching both fields. To get this result, use both
BooleanQuery and DisjunctionMaxQuery: for each term a DisjunctionMaxQuery searches for it
in each field, while the set of these DisjunctionMaxQuery's is combined into a BooleanQuery.
The tie breaker capability allows results that include the same term in multiple fields to
be judged better than results that include this term in only the best of those multiple fields,
without confusing this with the better case of two different terms in the multiple fields."

"Maximum ...  as produced by any subquery", OK that makes sense.  We pick the score that is
the highest
If you have
DMQ ( Q1, Q2, Q3 )
And the subquery scores are ( 0.1, 0.2, 0.1) then Q2 wins and the overall score is 0.2 right?
But then what is the meaning of "any additional matching subqueries"?
Is the description then

(1)    Running with the idea that something has to tie to involve a tie-breaker, I might say
"If two subqueries are both the maximum of all the subqueries, the score will be the maximum
score increased by the tie breaker increment"
Example: DMAQ with an increment of 0.15 and three subqueries ( Q1, Q2, Q3 ) which score (0.1,
0.2, 0.2) then
because there are two 0.2 score then the score for this query will be 0.2 + 0.15 or 0.35.
 If the scores are (0.1,0.1, 0.2) the overall score is 0.2, because we had only one maximum.

OR alternately forgetting the idea that anything is tied within the set of subqueries


(2)    "if in addition to the maximum subquery score there are any other subqueries with nonzero
scores, the overall score is increased by the tiebreaker increment."

Example: Using the same increment of 0.15, if the score are (0.0, 0.0, 0.2) the result is
score 0.2, but (0.0, 0.1, 0.2 ) scores 0.35.

I'm leaning toward interpretation #2, but "tie breaking for ... additional matching..." does
not say that to me, because I don't see any tie.
Once I understand that I'll ask about the how to "use both BooleanQuery and DisjunctionMaxQuery".

-Paul

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message