commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yossi Tamari (JIRA)" <>
Subject [jira] [Commented] (CODEC-199) Bug in HW rule in Soundex
Date Thu, 30 Mar 2017 23:51:41 GMT


Yossi Tamari commented on CODEC-199:

By silent I meant HW, not vowels. 
You said "Nor is it clear whether there is a use case for different letters to have the same
behaviour as HW.", I was making the point that the list HW may need to change, in this instance
to empty.

The rule that you quote is not a variant, it is step 3 in the Wikipedia definition, and is
implemented. Again, in this case not all vowels need to be mapped to #, but the opposite,
no letter should be mapped to #.

In other words, this is exactly a use case for where H and W are treated as vowels, and it
is easily handled by mapping them to 0. (with my second patch. Without it, you can't handle
this variant using this library.)

> Bug in HW rule in Soundex
> -------------------------
>                 Key: CODEC-199
>                 URL:
>             Project: Commons Codec
>          Issue Type: Bug
>    Affects Versions: 1.10
>            Reporter: Yossi Tamari
>             Fix For: 1.11
>         Attachments: better.patch, soundex.patch
> The Soundex algorithm says that if two characters that map to the same code are separated
by H or W, the second one is not encoded.
> However, in the implementation (in Soundex.getMappingCode() line 191), a character that
is preceded by two characters that are either H or W, is not encoded, regardless of what the
last consonant was.
> Source:

This message was sent by Atlassian JIRA

View raw message