commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Niall Pemberton (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (CODEC-84) Double Metaphone bugs in alternative encoding
Date Mon, 03 Aug 2009 00:39:14 GMT

     [ https://issues.apache.org/jira/browse/CODEC-84?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Niall Pemberton resolved CODEC-84.
----------------------------------

    Resolution: Fixed

> Double Metaphone bugs in alternative encoding
> ---------------------------------------------
>
>                 Key: CODEC-84
>                 URL: https://issues.apache.org/jira/browse/CODEC-84
>             Project: Commons Codec
>          Issue Type: Bug
>    Affects Versions: 1.3
>            Reporter: Niall Pemberton
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: CODEC-84-DoubleMetaphone-Alternate-bugs.patch
>
>
> The new test case (CODEC-83) has highlighted a number of issues with the "alternative"
encoding in the Double Metaphone implementation
> 1) Bug in the handleG method when "G" is followed by "IER" 
>  *  The alternative encoding of "Angier" results in "ANKR" rather than "ANJR"
>  *  The alternative encoding of "rogier" results in "RKR" rather than "RJR"
> The problem is in the handleG() method and is caused by the wrong length (4 instead of
3) being used in the contains() method:
> {code}
>  } else if (contains(value, index + 1, 4, "IER")) {
> {code}
> ...this should be
> {code}
>  } else if (contains(value, index + 1, 3, "IER")) {
> {code}
> 2)  Bug in the handleL method
>  * The alternative encoding of "cabrillo" results in "KPRL " rather than "KPR"
> The problem is that the first thing this method does is append an "L" to both primary
& alternative encoding. When the conditionL0() method returns true then the "L" should
not be appended for the alternative encoding
> {code}
> result.append('L');
> if (charAt(value, index + 1) == 'L') {
>     if (conditionL0(value, index)) {
>         result.appendAlternate(' ');
>     }
>     index += 2;
> } else {
>     index++;
> }
> return index;
> {code}
> Suggest refeactoring this to
> {code}
> if (charAt(value, index + 1) == 'L') {
>     if (conditionL0(value, index)) {
>         result.appendPrimary('L');
>     } else {
>         result.append('L');
>     }
>     index += 2;
> } else {
>     result.append('L');
>     index++;
> }
> return index;
> {code}
> 3) Bug in the conditionL0() method for words ending in "AS" and "OS"
>  * The alternative encoding of "gallegos" results in "KLKS" rather than "KKS"
> The problem is caused by the wrong start position being used in the contains() method,
which means its not checking the last two characters of the word but checks the previous &
current position instead:
> {code}
>         } else if ((contains(value, index - 1, 2, "AS", "OS") || 
> {code}
> ...this should be
> {code}
>         } else if ((contains(value, value.length() - 2, 2, "AS", "OS") || 
> {code}
> I'll attach a patch for review

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message