hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Tatarinov <i...@decide.com>
Subject Re: HiveQL for 'rank() over (partition by ... order by ...)'?
Date Wed, 25 May 2011 14:43:01 GMT
Yes, we have UDF functions that compute cumulative (such a rank) and moving
aggregates.
In each case, the first parameter is the partitioning key so that the
function knows when to 'reset' at the start of a new partition.
To enforce partitioning and ordering, use DISTRIBUTE BY key  / SORT BY key,
sort_attr

One caveat: you have to do DISTRIBUTE/SORT BY in a subquery. The following
doesn't work correctly:

SELECT product_id, date, mavg(product_id, price, 10)
FROM Prices
DISTRIBUTE BY product_id
SORT BY product_id, date

You have to use a subquery:
SELECT ...
FROM (SELECT *
           FROM Prices
           DISTRIBUTE BY ..
           SORT BY ...)


If you have multiple key attributes you have to combine them into a single
key using concat_ws() for example.

As far as the UDF implementation goes, you have to define your function for
every combination of key and base attribute types that you are going to use.
 I don't know any way around that.


Hope this helps


On Wed, May 25, 2011 at 4:45 AM, 김영우 <warwithin@gmail.com> wrote:

> Hi All,
>
> I'm trying to convert Oracle SQL to HiveQL, esp Oracle Analytic functions.
> Is it possible to implement using Hive UDFs or workarounds?
> I would like to hear your experiences and advice.
>
> Thanks in advance.
>
> - Youngwoo
>

Mime
View raw message