Hi there

Lucene calucaltes the string si= milarity between two strings s1 and s2 according to the formula<= /span>

Similarity =3D Levenshtein-Dist= ance(s1,s2)/min(Length(s1),Length(s2))

I would have thought Lucene wou= ld divide by the length of the longer string. In particular, the above form= ula could – in my understanding – lead to a negative similarity= , since the Levenshtein distance can be as long as the length of the longer string.

Why does Lucene calculate the s= imilarity in this way?

Cheers,

Damian

