commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yufcuy <...@git.apache.org>
Subject [GitHub] commons-lang issue #189: new impl of LevenshteinDistance
Date Sun, 18 Sep 2016 02:18:00 GMT
Github user yufcuy commented on the issue:

    https://github.com/apache/commons-lang/pull/189
  
    Hello, @britter @kinow   
    The details of Levenshtein distance can be find at https://en.wikipedia.org/wiki/Levenshtein_distance,
as it describe, the algorithm is compute a matrix to hold the Levenshtein distances between
all prefixes of the first string and all prefixes of the second. For all i and j, d[i][j]
will hold the Levenshtein distance between the first i characters of s and the first j characters
of t. 
    The previous impl use two matrix rows for the construction. When we calculate the value
of d[i][j] we find only the value of d[i-1][j-1], d[i-1][j] and d[i][j-1] are used according
to the algorithm. And we use a matrix row p[] for the construction, when calculate p[j] at
the ith iteration, it actually calculate the value of d[i][j], and only d[i-1][j-1] is covered
by p[j-1], so we use variable 'upper_left' to hold d[i-1][j-1], and use variable 'upper' to
hold next 'upper_left'.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message