commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ben Kazez (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CODEC-248) language.DaitchMokotoffSoundex gives overly broad results for tokens containing RS
Date Mon, 06 Aug 2018 23:28:00 GMT

     [ https://issues.apache.org/jira/browse/CODEC-248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ben Kazez updated CODEC-248:
----------------------------
    Description: 
I am using Apache commons codec in Elasticsearch (via Lucene).

# GIERSZLIK codes to 548500 or 594850
# GOTSALK codes to 548500
# These names don't sound alike, but the matching codes means a search for one returns the
other.

Solution: I exchanged emails with Gary Mokotoff, co-creator of the algorithm, who said:

{quote}I would drop RS from the table. ... I cannot think of any language where RS is pronounced
"S" (4).{quote}
 
  

  was:
# GIERSZLIK codes to 548500 or 594850
 # GOTSALK codes to 548500
 # These names don't sound alike, but the matching codes means a search for one returns the
other.

|Solution: I exchanged emails with Gary Mokotoff, co-creator of the algorithm, who said:
{quote}I would drop RS from the table. ... I cannot think of any language where RS is pronounced
"S" (4).{quote}|
 
 


> language.DaitchMokotoffSoundex gives overly broad results for tokens containing RS
> ----------------------------------------------------------------------------------
>
>                 Key: CODEC-248
>                 URL: https://issues.apache.org/jira/browse/CODEC-248
>             Project: Commons Codec
>          Issue Type: Bug
>            Reporter: Ben Kazez
>            Priority: Minor
>
> I am using Apache commons codec in Elasticsearch (via Lucene).
> # GIERSZLIK codes to 548500 or 594850
> # GOTSALK codes to 548500
> # These names don't sound alike, but the matching codes means a search for one returns
the other.
> Solution: I exchanged emails with Gary Mokotoff, co-creator of the algorithm, who said:
> {quote}I would drop RS from the table. ... I cannot think of any language where RS is
pronounced "S" (4).{quote}
>  
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message