hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Chang (JIRA)" <j...@apache.org>
Subject [jira] Created: (HIVE-1545) Add a bunch of UDFs and UDAFs
Date Mon, 16 Aug 2010 22:39:18 GMT
Add a bunch of UDFs and UDAFs

                 Key: HIVE-1545
                 URL: https://issues.apache.org/jira/browse/HIVE-1545
             Project: Hadoop Hive
          Issue Type: New Feature
            Reporter: Jonathan Chang
            Assignee: Jonathan Chang
            Priority: Minor

Here some UD(A)Fs which can be incorporated into the Hive distribution:

UDFArgMax - Find the 0-indexed index of the largest argument. e.g., ARGMAX(4, 5, 3) returns
UDFBucket - Find the bucket in which the first argument belongs. e.g., BUCKET(x, b_1, b_2,
b_3, ...), will return the smallest i such that x > b_{i} but <= b_{i+1}. Returns 0
if x is smaller than all the buckets.
UDFFindInArray - Finds the 1-index of the first element in the array given as the second argument.
Returns 0 if not found. Returns NULL if either argument is NULL. E.g., FIND_IN_ARRAY(5, array(1,2,5))
will return 3. FIND_IN_ARRAY(5, array(1,2,3)) will return 0.
UDFGreatCircleDist - Finds the great circle distance (in km) between two lat/long coordinates
(in degrees).
UDFLDA - Performs LDA inference on a vector given fixed topics.
UDFNumberRows - Number successive rows starting from 1. Counter resets to 1 whenever any of
its parameters changes.
UDFPmax - Finds the maximum of a set of columns. e.g., PMAX(4, 5, 3) returns 5.
UDFRegexpExtractAll - Like REGEXP_EXTRACT except that it returns all matches in an array.
UDFUnescape - Returns the string unescaped (using C/Java style unescaping).
UDFWhich - Given a boolean array, return the indices which are TRUE.

UDAFCollect - Takes all the values associated with a row and converts it into a list. Make
sure to have: set hive.map.aggr = false;
UDAFCollectMap - Like collect except that it takes tuples and generates a map.
UDAFEntropy - Compute the entropy of a column.
UDAFPearson (BROKEN!!!) - Computes the pearson correlation between two columns.
UDAFTop - TOP(KEY, VAL) - returns the KEY associated with the largest value of VAL.
UDAFTopN (BROKEN!!!) - Like TOP except returns a list of the keys associated with the N (passed
as the third parameter) largest values of VAL.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message