hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Seema Datar <sda...@yahoo-inc.com>
Subject Re: computing median and percentiles
Date Thu, 20 Mar 2014 04:37:42 GMT
Not really. If it was a single column with no counters, Hive provides an option to use percentile.
So basically if the data was like -

100
100
200
200
200
200
300

But if we have 2 columns, one that maintain the value and the other that maintains the count,
how can Hive be used to derive the percentile?

Value     Count
100          2
200          4
300          1

Thanks,
Seema

From: Stephen Sprague <spragues@gmail.com<mailto:spragues@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>>
Date: Thursday, March 20, 2014 5:28 AM
To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: Re: computing median and percentiles

not a hive question is it?   its more like a math question.



On Wed, Mar 19, 2014 at 1:30 PM, Seema Datar <sdatar@yahoo-inc.com<mailto:sdatar@yahoo-inc.com>>
wrote:


I understand the percentile function is supported in Hive in the latest versions. However,
how does once calculate percentiles when the data is across two columns. So say -

Value  Count

100 2   ( so basically 100 occurred twice)
200 4
300 1
400 6
500 3


I want to find out the 0.25 percentile for the value distribution. How can I do it using the
Hive percentile function?




Mime
View raw message