hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kostiantyn Kudriavtsev (JIRA)" <>
Subject [jira] [Commented] (HIVE-2361) Add some UDFs which help to migrate Oracle to Hive
Date Thu, 19 Dec 2013 18:12:07 GMT


Kostiantyn Kudriavtsev commented on HIVE-2361:

It seems useful, specially Scalar functions... does this code available somewhere?

>  Add some UDFs which help to migrate Oracle to Hive
> ---------------------------------------------------
>                 Key: HIVE-2361
>                 URL:
>             Project: Hive
>          Issue Type: New Feature
>          Components: UDF
>    Affects Versions: 0.8.0
>            Reporter: JunHo Cho
>            Assignee: JunHo Cho
>            Priority: Minor
>              Labels: features
>         Attachments: nexr-udf.tar
> Here some UDFs which can be matched to oracle functions:
> There are two kinds of oracle functions. one is scalar function and another is analytic
> Most scalar functions in Oracle can be converted to hive's udf directly.  
> Oracle Scalar Function
> GenericUDFDecode : Compares first argument to each other value one by one. e.g., DECODE(x,0,'zero',1,'one')
will return 'zero' if x is 0
> GenericUDFGreatest : Return the greatest of the list of one or more expressions. e.g.,
GREATEST(2,5,12,3) will return 12
> GenericUDFInstr : Return the location of a substring in a string. e.g., INSTR('next',
'e') will return 2
> GenericUDFLnnvl : Evaluate a condition when one or both operands of the condition may
be null. e.g., LNNVL(2 > 4) will return true
> GenericUDFNVL : Replace null with a string in the results of a query. e.g., NVL(null,'hive')
will return hive
> GenericUDFNVL2 : Determine the value returned by a query based on whether a specified
expression is null or not null. e.g., NVL2(null,'not null','null value') will return 'null
> GenericUDFToNumber : Convert a string to a number. e.g., TO_NUMBER('112','999') will
return 112
> GenericUDFTrunc : Returns a date truncated to a specific unit of measure. e.g., TRUNC('2002-11-02
01:01:01','YYYY') will return '2002-01-01 00:00:00'
> Oracle Analytic Function
> Most analytic functions in Oracle can't be converted to hive's query and udf directly.
> Following udfs should be used with DISTRIBUTED, SORT BY and HASH of hive to support analytic
> e.q., SELECT _FUNC_(hash(col1), col2, ...) FROM SELECT ~ FROM table DISTRIBUTED BY hash(col1)
SORT BY col1, col2 ...
> GenericUDFSum : Calculate a cumulative sum.
> GenericUDFRank : Assign a sequential order, or rank within some group based on key.
> GenericUDFDenseRank : Act like RANK function except that it assigns consecutive ranks.
> GenericUDFRowNumber : Return sequence integer value within some group based on key.
> GenericUDFMax : Determine the highest value within some group based on key.
> GenericUDFMin : Determine the lowest value within some group based on key.
> GenericUDFLag : Access data from a previous row.
> This udfs was developed with hive-pdk

This message was sent by Atlassian JIRA

View raw message