David Spencer wrote:
> But what is right if there are > 2 terms in terms of the phrases  does
> it have a phrase for every pair of terms like this (ignore fields and
> boosts and proximity for a sec):
>
> search for "t1 t2 t3" gives you these phrases in addition to the direct
> field matches:
>
> "t1 t2"
> "t2 t3"
> "t1 t3"
What the sloppy phrase scorer does when slop=infinity is find the
smallest windows containing all three terms and scores things based on
the width of these windows, by summing Similarity.sloppyFreq(). That's
what I was figuring we'd start with. We could alternately construct all
pairwise queries, but that could get expensive.
This use of slop may fail to reward a match enough when two of the terms
occur frequently phrasally and the third only appears rarely in the
text. Perhaps we should implement a new DensityPhraseQuery that does
not require all terms but rewards for more small gaps between distinct
query terms. Similarity.sloppyFreq() could be called for all gaps and
summed. So if two of three terms occurred five times as a phrase, but
the third term didn't occur at all in the field, the freq would be 5.0
(since there would be five gaps of size zero). But if all three terms
occurred as a phrase five times then the score would be 10.0, since
there would be ten gaps of size zero. Does this make sense? It would
not be hard to implement.
>> Do folks agree that this is a good general formulation? If so, would
>> someone like to contribute a version of MultiFieldQueryParser that
>> implements this? The API should probably be something like:
>
>
> I might already have this done, just confirm the above question re > 2
> terms.
Did I confirm or deny?
Doug

To unsubscribe, email: lucenedevunsubscribe@jakarta.apache.org
For additional commands, email: lucenedevhelp@jakarta.apache.org
