I know of this type of a call would give you a subset of the table ... also
I think you can use a group by clause to get it for groups of data.
> SELECT PERCENTILE(val, 0.5) FROM pct_test WHERE val > 100;
Couldn't you use this call a few times to get the value for each percentile
value?
I think this is also a valid syntax though I've not used it.
> SELECT PERCENTILE(val, array(0.25, 0.5, 0.75)) FROM pct_test WHERE val >
100;
Note that neither of these are available with older versions of hive.
Cheers,
Ajo.
On Tue, Mar 1, 2011 at 12:51 PM, Sameer Kalburgi
<sameerkalburgi@gmail.com>wrote:
> Hello,
>
> Does anyone have any experience calculating the percentile / percentrank
> for each row in a table?
>
> I see that there are built in UDAFs to calculate the percentile, but that
> would only return a single value for the entire table.
>
> Essentially, I'm trying to recreate the Excel PercentRank function
> described here:
> http://msdn.microsoft.com/enus/library/bb239448(v=office.12).aspx
>
> For example, the input:
>
> Score
> 
> 10
> 20
> 30
> 40
> 50
>
> would yield
>
> PercentRank
> 
> 0.0
> 0.25
> 0.5
> 0.75
> 1.0
>
> Any ideas?
>
> Thanks,
> Sameer
>
