commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebb (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CODEC-199) Bug in HW rule in Soundex
Date Fri, 31 Mar 2017 15:06:41 GMT

    [ https://issues.apache.org/jira/browse/CODEC-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15951090#comment-15951090
] 

Sebb commented on CODEC-199:
----------------------------

The penwith URL is very useful. Thanks.
It states that HW were treated as vowels in the original 'Simplified Soundex'.

This difference is not described in the Wikipedia article, and regarded as erroneous in the
thoughtco page.
The thoughtco page is unhelpful in other ways, e.g. it uses SUTTON as an example of a name
starting with a double letter! [A name like LLOYD would be OK]

I think there is a solution which will allow for the 'Simplified Soundex' variant as well
as the current American Soundex - without compromising existing behaviour or needing to change
the public constant. I hope to update the code in the next few days once it has been tested
further.

==

It's wasteful to implement features that are not going to be used, and maintenance is increased.
Rather more importantly, unless there are usage examples then creating valid test cases is
error prone.
That is why I have been stressing the need for use cases.

> Bug in HW rule in Soundex
> -------------------------
>
>                 Key: CODEC-199
>                 URL: https://issues.apache.org/jira/browse/CODEC-199
>             Project: Commons Codec
>          Issue Type: Bug
>    Affects Versions: 1.10
>            Reporter: Yossi Tamari
>             Fix For: 1.11
>
>         Attachments: better.patch, soundex.patch
>
>
> The Soundex algorithm says that if two characters that map to the same code are separated
by H or W, the second one is not encoded.
> However, in the implementation (in Soundex.getMappingCode() line 191), a character that
is preceded by two characters that are either H or W, is not encoded, regardless of what the
last consonant was.
> Source: http://en.wikipedia.org/wiki/Soundex#American_Soundex



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message