commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebb (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CODEC-250) Wrong value calculated by Cologne Phonetic if a special character is placed between equal letters
Date Thu, 27 Sep 2018 09:36:00 GMT

    [ https://issues.apache.org/jira/browse/CODEC-250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16630058#comment-16630058
] 

Sebb commented on CODEC-250:
----------------------------

I also noticed that the PREPROCESS_MAP contains the German small s (looks like a beta).
However, this is converted to 'SS' by String#toUpperCase(Locale.GERMAN), so it serves no purpose
in the map.
This made me wonder if the small-s should be converted before up-casing.
But AFAICT 'SS' is treated the same way as 'S' so it does not matter when it is converted.

> Wrong value calculated by Cologne Phonetic if a special character is placed between equal
letters
> -------------------------------------------------------------------------------------------------
>
>                 Key: CODEC-250
>                 URL: https://issues.apache.org/jira/browse/CODEC-250
>             Project: Commons Codec
>          Issue Type: Bug
>    Affects Versions: 1.5, 1.11
>            Reporter: Alex Volodko
>            Priority: Major
>
> The algorith for cologne phonetic is (simpilied):
>  # Encode letter by letter from left to right according to the conversion table.
>  # Remove all digits occurring more than once next to each other.
>  # Remove all code "0" except at the beginning.
> Characters which are not specified in conversion table (such as hyphens) are ignored.
See https://en.wikipedia.org/wiki/Cologne_phonetics
> If the input is "test-test" the step results will be:
>  # 20822082
>  # 2082082
>  # 28282
> The expected result for "test-test" is therefor 28282.
> The actual result for "test-test" is 282{color:#FF0000}2{color}82.
> This bug is caused by the fix from
> [https://github.com/apache/commons-codec/commit/72c8759a22c6552a2dfcdf61b29729f981752879]
> and is present since 1.5



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message