commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ben Kazez (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CODEC-248) language.DaitchMokotoffSoundex gives overly broad results for tokens containing RS
Date Mon, 06 Aug 2018 23:26:00 GMT
Ben Kazez created CODEC-248:
-------------------------------

             Summary: language.DaitchMokotoffSoundex gives overly broad results for tokens
containing RS
                 Key: CODEC-248
                 URL: https://issues.apache.org/jira/browse/CODEC-248
             Project: Commons Codec
          Issue Type: Bug
            Reporter: Ben Kazez


# GIERSZLIK codes to 548500 or 594850
 # GOTSALK codes to 548500
 # These names don't sound alike, but the matching codes means a search for one returns the
other.

|Solution: I exchanged emails with Gary Mokotoff, co-creator of the algorithm, who said:
{quote}I would drop RS from the table. ... I cannot think of any language where RS is pronounced
"S" (4).{quote}|
 
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message