lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Spencer <dave-lucene-...@tropo.com>
Subject Re: URL to compare 2 Similarity's ready-- Re: Scoring benchmark evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?
Date Mon, 31 Jan 2005 22:55:03 GMT
Doug Cutting wrote:

> David Spencer wrote:
> 
>> But what is right if there are > 2 terms in terms of the phrases - 
>> does it have a phrase for every pair of terms like this (ignore fields 
>> and boosts and proximity for a sec):
>>
>> search for "t1 t2 t3" gives you these phrases in addition to the 
>> direct field matches:
>>
>> "t1 t2"
>> "t2 t3"
>> "t1 t3"
> 
> 
> What the sloppy phrase scorer does when slop=infinity is find the 
> smallest windows containing all three terms and scores things based on 
> the width of these windows, by summing Similarity.sloppyFreq().  That's 
> what I was figuring we'd start with.  We could alternately construct all 
> pairwise queries, but that could get expensive.
> 
> This use of slop may fail to reward a match enough when two of the terms 
> occur frequently phrasally and the third only appears rarely in the 
> text.  Perhaps we should implement a new DensityPhraseQuery that does 
> not require all terms but rewards for more small gaps between distinct 
> query terms.  Similarity.sloppyFreq() could be called for all gaps and 
> summed.  So if two of three terms occurred five times as a phrase, but 
> the third term didn't occur at all in the field, the freq would be 5.0 
> (since there would be five gaps of size zero).  But if all three terms 
> occurred as a phrase five times then the score would be 10.0, since 
> there would be ten gaps of size zero.  Does this make sense?  It would 
> not be hard to implement.
> 
>>> Do folks agree that this is a good general formulation?  If so, would 
>>> someone like to contribute a version of MultiFieldQueryParser that 
>>> implements this?  The API should probably be something like:
>>
>>
>>
>> I might already have this done, just confirm the above question re > 2 
>> terms.
> 
> 
> Did I confirm or deny?

Confirmed! Let me tweak my code and I'll post it for examination.


> 
> Doug
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message