lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Explaining a filter; Scorer extending Matcher; (was: BooleanWeight.normalize(float) doesn't normalize prohibited clauses?)
Date Sun, 21 May 2006 12:15:45 GMT
On Friday 12 May 2006 00:12, Chris Hostetter wrote:
> 
> : >    Boolean match = null;
> :
> : As for the thoughts question below: this java-dev, not c-dev :)
> 
...
> 
> ...i'm not trying to use null for false, i'm using null to indicate that
> wether or not a match occured has not been explicitly specified -- it can
> only be infered from the "value" of the explanation.  "true" means a
> definitive match, and "false" means a definitive non-match.
> 
> : As long as there is no match, there will be no score, and no score could
> : also be represented by NaN, so one might by default initialize the score
> : value to NaN, drop setMatch() and isMatch() above, and have only:
> :
> : public Boolean getMatch() { return ! Float.isNaN(score); }
> 
> I assume by "score" you mean "value" (Explanations don't have a score
> attribute, just a value attribute).  I don't want to go down the road of
> assuming a match based on some special value of of the value -- that's the
> cause of the current problems.  NaN is admitedly a better choice then
> "0.0", but it's still a value that could concievably come up when scoring
> a document for some as yet non-existent Scorer.
> 
> what i really want is for the Explanation class to precisely model the
> same information as a Scorer returns for each doc...
> 
>    1) if scorer.doc() would ever return X, then the Explanation for that X
>       should have a boolean indicating a "match"
>    2) whatever value is returned by scorer.score() when scorer.doc() is
>       returning X should be what the Explanation for X returns when you
>       call getValue().
> 
> : But I'm not yet sure wether that would work in all cases.
> : Is it possible/thinkable for a (sub)query to have a score value for a
> : document, but no match against the same document?
> 
> I'm not sure if it can ever exist, since you currently can't ask a Scorer
> for the score of a document unless the document matches, but the converse
> is certainly true: a document can match on a query but have a score of 0,
> or less then 0, or NaN ... that's what i'm trying to deal with, i want to
> be able to model all of those cases in an Explanation object.

Having a score value without a match not normally possible when searching
a query, but for Filter this is actually the normal case: a Filter may match a 
document, but it does not provide a score value.

> 
> : That would be avoided by having getMatch() only. Once setMatch is called,
> : getMatch would return false, except when setMatch is given a NaN, but
> : that is probably not done in the current Lucene code.
> 
> Right.  currently *most* Explanations get a 0.0 value set if it's a non
> match (some of them don't work at all for non-matches) .. which is why if
> an explicit match boolean isn't specified, I want the fall back assumption
> to be based on wether the value is 0.0 -- because that's the current
> method for determining a match, and it will work with legacy custom Query
> types people may have written which aren't in the lucene code base.
> 
> : > 4) change BooleanWeight.explain to call isMatch on the sub-explanations
> : > when testing prohibited/required clauses.
> :
> : Or call getMatch(), whichever is implemented. This makes explaining the
> 
> i want to impliment both ... getMatch() existing for people that want to
> know the exact state of the match (and will check for null to determine if
> hte exact state is unknown) and isMatch for people who want the "best
> guess" behavior ... which is not encapsulated in a method, instead of it
> just being "convention"

In case Explanation is also to explain what a Filter does, it would need to
have both a match flag and a score value.

At the moment I'm trying to implement filters by refactoring Scorer to have an
abstract superclass Matcher that could also become a superclass for filter
implementations (instead of DocNrSkipper).

This Matcher class has all methods of Scorer that are not using score
values: doc(), next(), skipTo(docNr).
An explain() method is also useful in such a Matcher, but it has
no score value available, it only knows whether or not a document matches.

The implementation also has an interface MatcherProvider (instead of
SkipFilter):
   public Matcher getMatcher(IndexReader reader) throws ioe;
Filter could implement MatcherProvider as an alternative to the
SkipFilter1.patch here:
http://issues.apache.org/jira/browse/LUCENE-328

Any thoughts on whether such a Matcher would be preferable to 
a DocNrSkipper that only has this method:
  int nextDocNr(int docNr)
?

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message