hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ajo Fod <ajo....@gmail.com>
Subject Re: Percent Rank Calculation
Date Tue, 01 Mar 2011 21:17:06 GMT
I know of this type of a call would give you a subset of the table ... also
I think you can use a group by clause to get it for groups of data.

> SELECT PERCENTILE(val, 0.5) FROM pct_test WHERE val > 100;

Couldn't you use this call a few times to get the value for each percentile
value?

I think this is also a valid syntax though I've not used it.

> SELECT PERCENTILE(val, array(0.25, 0.5, 0.75)) FROM pct_test WHERE val >
100;

Note that neither of these are available with older versions of hive.

Cheers,
-Ajo.

On Tue, Mar 1, 2011 at 12:51 PM, Sameer Kalburgi
<sameerkalburgi@gmail.com>wrote:

> Hello,
>
> Does anyone have any experience calculating the percentile / percentrank
> for each row in a table?
>
> I see that there are built in UDAFs to calculate the percentile, but that
> would only return a single value for the entire table.
>
> Essentially, I'm trying to recreate the Excel PercentRank function
> described here:
> http://msdn.microsoft.com/en-us/library/bb239448(v=office.12).aspx
>
> For example, the input:
>
> Score
> --------
> 10
> 20
> 30
> 40
> 50
>
> would yield
>
> PercentRank
> ------------------
> 0.0
> 0.25
> 0.5
> 0.75
> 1.0
>
> Any ideas?
>
> Thanks,
> Sameer
>

Mime
View raw message