lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: alternative scoring algorithm for PhraseQuery
Date Wed, 07 Mar 2007 21:49:41 GMT
On Wednesday 07 March 2007 18:12, Philipp Nanz wrote:
> Thanks for your answers. Your input is really appreciated :-)
> 
> @Paul Elschot:
> Thanks for the hint. I guess I could use coord() to penalize missing
> terms like this:
> 
> Query: a b c d
> Doc A: a b c d => sloppyFreq(0) * coord(4, 4) = 1
> Doc B: a b c => sloppyFreq(0) * coord(3, 4) = 0,75
> 
> Doc would score higher. I guess that might be a valid solution.
> 
> There is a drawback though, i.e. sloppyFreq(1) * coord(4, 4) = 0,5
> 
> So a perfect match with one insertion would score less than a 3 of 4
> match with no slop.

Your examples are based on DefaultSimilarity. 
With a  Similarity in your Scorer you can leave the tradeoff between these
factors to the user of your query by letting them provide the Similarity
at query time.

> 
> As for spanqueries:
> My implementation is based of the default PhraseQuery with slop > 0. I
> don't know the inner workings of SpanQueries, but what you describe
> sounds alot like what the PhraseQuery does as well (i.e. calculate max
> distance between last and first term, and use that with sloppyFreq()).
> 
> I chose PhraseQuery as base of my work, because I felt that it would
> offer better performance than firing off a plethora of spanqueries to
> express the same query.
> 
> Long story short: My problem would generalize to spanqueries if
> spanqueries would face the problem of deleted terms. But I guess they
> don't?!

Correct, they don't.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message