lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damian Birchler <>
Subject FuzzyQuery minimumSimilarity
Date Mon, 05 Nov 2012 16:00:48 GMT
Hi there

Lucene calucaltes the string similarity between two strings s1 and s2 according to the formula

Similarity = Levenshtein-Distance(s1,s2)/min(Length(s1),Length(s2))

I would have thought Lucene would divide by the length of the longer string. In particular,
the above formula could - in my understanding - lead to a negative similarity, since the Levenshtein
distance can be as long as the length of the longer string.

Why does Lucene calculate the similarity in this way?


View raw message