lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damian Birchler <Damian.Birch...@bsiag.com>
Subject FuzzyQuery minimumSimilarity
Date Mon, 05 Nov 2012 16:00:48 GMT
Hi there

Lucene calucaltes the string similarity between two strings s1 and s2 according to the formula

Similarity = Levenshtein-Distance(s1,s2)/min(Length(s1),Length(s2))

I would have thought Lucene would divide by the length of the longer string. In particular,
the above formula could - in my understanding - lead to a negative similarity, since the Levenshtein
distance can be as long as the length of the longer string.

Why does Lucene calculate the similarity in this way?

Cheers,
Damian


Mime
View raw message