commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bruno P. Kinoshita (JIRA)" <>
Subject [jira] [Commented] (LANG-591) A more complex Levenshtein distance would be useful
Date Sat, 25 Oct 2014 01:23:33 GMT


Bruno P. Kinoshita commented on LANG-591:


I need to do some data matching for a project, and started using the levenshtein distance
from StringUtils. Ended up using a mix of code from other projects (simmetric, lingpipe, talend,
etc), and realized there are several edit distance algorithms (jaccard, jaro-wrinkler, damerau-levenshtein,
bitap, q-gram, etc).

Are there plans to include these other algorithms in [lang]? IIRC, somewhere someone talked
about a commons-text component, though I'm not aware if there's such a component in sandbox
or attic, but maybe these algorithms could fit there? 

> A more complex Levenshtein distance would be useful
> ---------------------------------------------------
>                 Key: LANG-591
>                 URL:
>             Project: Commons Lang
>          Issue Type: New Feature
>          Components: lang.*
>    Affects Versions: 3.0
>            Reporter: Benson Margulies
>             Fix For: Review Patch
>         Attachments: LANG-591.patch
> For some applications, it is necessary to get insert/delete/substitution counts from
the distance algorithm. I am attaching a patch that provides this.

This message was sent by Atlassian JIRA

View raw message