hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Krishna (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-4053) Add support for phonetic algorithms in Hive
Date Sat, 23 Feb 2013 05:52:16 GMT

    [ https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13585025#comment-13585025
] 

Krishna commented on HIVE-4053:
-------------------------------

I've implemented 'Refined Soundex' algorithm using a GenericUDF and would like to share it
for a review by experts as I'm a newbie.

Change Details:
A new java class is created: GenericUDFRefinedSoundex.java
Add a entry to FunctionRegistry.java: registerGenericUDF("soundex_ref", GenericUDFRefinedSoundex.class);

Both files are attached to the email.

I'm planning to implement other phonetic algorithms and submit all as a single patch. I understand
there are many other steps that I need to finish before a patch is ready but for now, if you
could review the attached code and provide feedback, it'll be great.

Here are the details of Refined Soundex algorithm:
First letter is stored
Subsequent letters are replaced by numbers as defined below-
 * B, P => 1
 * F, V => 2
 * C, K, S => 3
 * G, J => 4
 * Q, X, Z => 5
 * D, T => 6
 * L => 7
 * M, N => 8
 * R => 9
 * Other letters => 0
Consecutive letters belonging to the same group are replaced by one letter

Example: 
> SELECT soundex_ref('Carren') FROM src LIMIT 1;
> C30908
                
> Add support for phonetic algorithms in Hive
> -------------------------------------------
>
>                 Key: HIVE-4053
>                 URL: https://issues.apache.org/jira/browse/HIVE-4053
>             Project: Hive
>          Issue Type: New Feature
>          Components: UDF
>            Reporter: Krishna
>         Attachments: FunctionRegistry.java, GenericUDFRefinedSoundex.java
>
>
> Following phonetic algorithms should be considered, which are very useful in search:
> Soundex
> Refined Soundex
> Daitch–Mokotoff Soundex
> Metaphone and Double Metaphone
> New York State Identification and Intelligence System (NYSIIS)
> Caverphone

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message