lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <>
Subject Re: BooleanWeight.normalize(float) doesn't normalize prohibited clauses?
Date Thu, 11 May 2006 21:23:32 GMT
On Thursday 11 May 2006 21:51, Chris Hostetter wrote:
> : If class Explanation would have a boolean attribute indicating whether
> : or not there was a match, the Explanation for BooleanQuery could
> : simply use this value from the Explanation of the prohibited clause.
> I've definitely thought about that a lot initially.  But my gut reaction
> was to try and fix the broken explain methods using the current
> limitations of the Explanation class to reduce the size of the patch.
> Unfortunately there are still some cases that can't be solved without that
> information, ie...
>      "w1 w2^0.0"    (testBQ12 from the bug)
>      "+w1^0.0 w2"   (testBQ18 from the bug)
> In the first case, documents which match both terms get their score
> divided in half because the the explain method can't tell the score of 0.0
> is becuase of the boost, so the coord factor gets applied by mistake.
> In the second case, the explain method assumes a total failure even
> if a document matches both terms because it got a 0.0 score from a
> required clause.
> Other then the (somewhat obscure) cases where a clause has a boost of 0.0,
> I managed to fix all of the BooleanQuery explain bugs by normalizing all
> clauses (even if they are prohibited) and fixing some poor assumptions in
> the explain method itself.
> I'm going to set BooleanQuery aside for a little bit and focus on some of
> the other query classes, but here's what i had in mind for changing the
> Explanation class, if anyone sees any problems please let me know...
> 1) Add the following to Explanation...
>    Boolean match = null;

As for the thoughts question below: this java-dev, not c-dev :)

>    public void setMatch(boolean b) { match = new Boolean(b); }
>    public Boolean getMatch() { return match; }
>    public boolean isMatch() {
>      return (null != match) ? match.booleanValue() : (0.0f < getValue());
>    }

As long as there is no match, there will be no score, and no score could
also be represented by NaN, so one might by default initialize the score
value to NaN, drop setMatch() and isMatch() above, and have only:

public Boolean getMatch() { return ! Float.isNaN(score); }

But I'm not yet sure wether that would work in all cases.
Is it possible/thinkable for a (sub)query to have a score value for a
document, but no match against the same document?
> 2) change Explanation.toString and toHtml to have something along the
> lines of ...
>     if (null != match)
>        buffer.append("Definite "+(match.booleanValue()?"":"NON-")+"match");
>     else
>        buffer.append("Assuming Match");
> 3) change all explain implimentations in lucene core to call setMatch when
> they call setValue.

That would be avoided by having getMatch() only. Once setMatch is called,
getMatch would return false, except when setMatch is given a NaN, but
that is probably not done in the current Lucene code.

> 4) change BooleanWeight.explain to call isMatch on the sub-explanations
> when testing prohibited/required clauses.

Or call getMatch(), whichever is implemented. This makes explaining the
score of a BooleanQuery much more natural than it is now.
It might even become practical to use the explain() methods of the scorers
that BooleanScorer2 is using. Only ConjunctionScorer would need
an implementation of explain() in that case.
> 4) change all of my Explanation tests to call isMatch.

> ...this would be backwards compatible for any non-core Query classes out
> there, and (as far as i can figure) be no worse then the current behavior
> of testing an explanation.getValue() == 0.0f  9since that's the fallback
> inside of isMatch())

With the implementation above, the current code would have to be
changed for the case when a 0.0f score value is used to indicate no match
in an explanation: in that case no call to setValue() should be done.

> 	thoughts?

null for false: long time no see...

Paul Elschot

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message