commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jan Martin Keil (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TEXT-130) JaroWinklerDistance: Wrong results due to precision of transpositions
Date Thu, 02 Aug 2018 20:22:00 GMT
Jan Martin Keil created TEXT-130:
------------------------------------

             Summary: JaroWinklerDistance: Wrong results due to precision of transpositions
                 Key: TEXT-130
                 URL: https://issues.apache.org/jira/browse/TEXT-130
             Project: Commons Text
          Issue Type: Bug
            Reporter: Jan Martin Keil


The method {{JaroWinklerDistance#matches}} returns {{transpositions / 2}} as integer. However,
it is not granted for {{transpositions}} to be even. E.g. comparing "aaabcd" and "aaacdb"
will result in {{transpositions}} = 3. Therefore the method must return 1.5, not 1. Otherwise
the similarity is 0.9611111111111111 instead of 0.9416666666666667.

I recommend to return {{halfTranspositions}} instead of {{transpositions}} and doing the cast
and division ({{(double) mtp[1] / 2}}) in {{JaroWinklerDistance#apply}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message