hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward Capriolo (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-1545) Add a bunch of UDFs and UDAFs
Date Fri, 09 Aug 2013 01:16:51 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734306#comment-13734306
] 

Edward Capriolo commented on HIVE-1545:
---------------------------------------

The annotations and other things you are seeing are part of an internal testing framework
at FB that was never open sourced, the hive plugin developer kit had similar annotations but
they were removed. So the UDFS likely compilefine but the test cases will not.
                
> Add a bunch of UDFs and UDAFs
> -----------------------------
>
>                 Key: HIVE-1545
>                 URL: https://issues.apache.org/jira/browse/HIVE-1545
>             Project: Hive
>          Issue Type: New Feature
>          Components: UDF
>            Reporter: Jonathan Chang
>            Assignee: Jonathan Chang
>            Priority: Minor
>         Attachments: core.tar.gz, ext.tar.gz, UDFEndsWith.java, UDFFindInString.java,
UDFLtrim.java, UDFRtrim.java, udfs.tar.gz, udfs.tar.gz, UDFStartsWith.java, UDFTrim.java
>
>
> Here some UD(A)Fs which can be incorporated into the Hive distribution:
> UDFArgMax - Find the 0-indexed index of the largest argument. e.g., ARGMAX(4, 5, 3) returns
1.
> UDFBucket - Find the bucket in which the first argument belongs. e.g., BUCKET(x, b_1,
b_2, b_3, ...), will return the smallest i such that x > b_{i} but <= b_{i+1}. Returns
0 if x is smaller than all the buckets.
> UDFFindInArray - Finds the 1-index of the first element in the array given as the second
argument. Returns 0 if not found. Returns NULL if either argument is NULL. E.g., FIND_IN_ARRAY(5,
array(1,2,5)) will return 3. FIND_IN_ARRAY(5, array(1,2,3)) will return 0.
> UDFGreatCircleDist - Finds the great circle distance (in km) between two lat/long coordinates
(in degrees).
> UDFLDA - Performs LDA inference on a vector given fixed topics.
> UDFNumberRows - Number successive rows starting from 1. Counter resets to 1 whenever
any of its parameters changes.
> UDFPmax - Finds the maximum of a set of columns. e.g., PMAX(4, 5, 3) returns 5.
> UDFRegexpExtractAll - Like REGEXP_EXTRACT except that it returns all matches in an array.
> UDFUnescape - Returns the string unescaped (using C/Java style unescaping).
> UDFWhich - Given a boolean array, return the indices which are TRUE.
> UDFJaccard
> UDAFCollect - Takes all the values associated with a row and converts it into a list.
Make sure to have: set hive.map.aggr = false;
> UDAFCollectMap - Like collect except that it takes tuples and generates a map.
> UDAFEntropy - Compute the entropy of a column.
> UDAFPearson (BROKEN!!!) - Computes the pearson correlation between two columns.
> UDAFTop - TOP(KEY, VAL) - returns the KEY associated with the largest value of VAL.
> UDAFTopN (BROKEN!!!) - Like TOP except returns a list of the keys associated with the
N (passed as the third parameter) largest values of VAL.
> UDAFHistogram

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message