lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: alternative scoring algorithm for PhraseQuery
Date Wed, 07 Mar 2007 18:23:27 GMT

: Query: a b c d
: Doc A: a b c d => sloppyFreq(0) * coord(4, 4) = 1
: Doc B: a b c => sloppyFreq(0) * coord(3, 4) = 0,75
: Doc would score higher. I guess that might be a valid solution.
: There is a drawback though, i.e. sloppyFreq(1) * coord(4, 4) = 0,5
: So a perfect match with one insertion would score less than a 3 of 4
: match with no slop.

but now you've put the control in the hands of the client: they can choose
a Similarity based on what is more important too them: if matching more
clauses is important, they can have a strict coord function, if matching
with less slop is more important they can have a strict sloppyFreq method.

: don't know the inner workings of SpanQueries, but what you describe
: sounds alot like what the PhraseQuery does as well (i.e. calculate max
: distance between last and first term, and use that with sloppyFreq()).

correct, the big advantage of Span queries is that while a SpanNearQuery
is roughly equivilent to a PhraseQuery, a PhraseQuery can only contain
Terms, whilea SPanNearQuery can contain other spans ... so a spannear
query for: "a b c d" can function even if "a" is a complicated sub query
(like "x OR y OR (p near q but not with z between them)")


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message