lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: URL to compare 2 Similarity's ready-- Re: Scoring benchmark evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?
Date Mon, 31 Jan 2005 22:48:01 GMT
David Spencer wrote:
> But what is right if there are > 2 terms in terms of the phrases - does 
> it have a phrase for every pair of terms like this (ignore fields and 
> boosts and proximity for a sec):
> search for "t1 t2 t3" gives you these phrases in addition to the direct 
> field matches:
> "t1 t2"
> "t2 t3"
> "t1 t3"

What the sloppy phrase scorer does when slop=infinity is find the 
smallest windows containing all three terms and scores things based on 
the width of these windows, by summing Similarity.sloppyFreq().  That's 
what I was figuring we'd start with.  We could alternately construct all 
pairwise queries, but that could get expensive.

This use of slop may fail to reward a match enough when two of the terms 
occur frequently phrasally and the third only appears rarely in the 
text.  Perhaps we should implement a new DensityPhraseQuery that does 
not require all terms but rewards for more small gaps between distinct 
query terms.  Similarity.sloppyFreq() could be called for all gaps and 
summed.  So if two of three terms occurred five times as a phrase, but 
the third term didn't occur at all in the field, the freq would be 5.0 
(since there would be five gaps of size zero).  But if all three terms 
occurred as a phrase five times then the score would be 10.0, since 
there would be ten gaps of size zero.  Does this make sense?  It would 
not be hard to implement.

>> Do folks agree that this is a good general formulation?  If so, would 
>> someone like to contribute a version of MultiFieldQueryParser that 
>> implements this?  The API should probably be something like:
> I might already have this done, just confirm the above question re > 2 
> terms.

Did I confirm or deny?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message