commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bruno P. Kinoshita (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LANG-591) A more complex Levenshtein distance would be useful
Date Sat, 25 Oct 2014 01:23:33 GMT

    [ https://issues.apache.org/jira/browse/LANG-591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14183841#comment-14183841
] 

Bruno P. Kinoshita commented on LANG-591:
-----------------------------------------

Hello, 

I need to do some data matching for a project, and started using the levenshtein distance
from StringUtils. Ended up using a mix of code from other projects (simmetric, lingpipe, talend,
etc), and realized there are several edit distance algorithms (jaccard, jaro-wrinkler, damerau-levenshtein,
bitap, q-gram, etc).

Are there plans to include these other algorithms in [lang]? IIRC, somewhere someone talked
about a commons-text component, though I'm not aware if there's such a component in sandbox
or attic, but maybe these algorithms could fit there? 

> A more complex Levenshtein distance would be useful
> ---------------------------------------------------
>
>                 Key: LANG-591
>                 URL: https://issues.apache.org/jira/browse/LANG-591
>             Project: Commons Lang
>          Issue Type: New Feature
>          Components: lang.*
>    Affects Versions: 3.0
>            Reporter: Benson Margulies
>             Fix For: Review Patch
>
>         Attachments: LANG-591.patch
>
>
> For some applications, it is necessary to get insert/delete/substitution counts from
the distance algorithm. I am attaching a patch that provides this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message