hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Pivovarov (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-9738) create SOUNDEX udf
Date Thu, 26 Feb 2015 21:53:04 GMT

     [ https://issues.apache.org/jira/browse/HIVE-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alexander Pivovarov updated HIVE-9738:
--------------------------------------
    Attachment: HIVE-9738.2.patch

patch #2

> create SOUNDEX udf
> ------------------
>
>                 Key: HIVE-9738
>                 URL: https://issues.apache.org/jira/browse/HIVE-9738
>             Project: Hive
>          Issue Type: Improvement
>          Components: UDF
>            Reporter: Alexander Pivovarov
>            Assignee: Alexander Pivovarov
>         Attachments: HIVE-9738.1.patch, HIVE-9738.2.patch
>
>
> Soundex is an encoding used to relate similar names, but can also be used as a general
purpose scheme to find word with similar phonemes.
> The American Soundex System
> The soundex code consist of the first letter of the name followed by three digits. These
three digits are determined by dropping the letters a, e, i, o, u, h, w and y and adding three
digits from the remaining letters of the name according to the table below. There are only
two additional rules. (1) If two or more consecutive letters have the same code, they are
coded as one letter. (2) If there are an insufficient numbers of letters to make the three
digits, the remaining digits are set to zero.
> Soundex Table
>  1 b,f,p,v
>  2 c,g,j,k,q,s,x,z
>  3 d, t
>  4 l
>  5 m, n
>  6 r
> Examples:
> Miller M460
> Peterson P362
> Peters P362
> Auerbach A612
> Uhrbach U612
> Moskowitz M232
> Moskovitz M213
> Implementation:
> http://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/language/Soundex.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message