hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
Date Tue, 24 Nov 2009 19:43:39 GMT

    [ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782120#action_12782120
] 

Todd Lipcon commented on HIVE-259:
----------------------------------

An easy way to do this that would work for a ton of data sets would to be essentially do counting
sort. If you have only a few thousand distinct values in the column to be analyzed, just make
a hashtable, count up how many you see, and then in the single reducer use the histogram to
figure out the percentile. This should work great for datasets like age, and even for sets
like "number of days since user signed up". For sets that are truly continuous, would be useful
when combined with a binning UDF to discretize it.

Sadly it's not general case, but would be an easy first step.

> Add PERCENTILE aggregate function
> ---------------------------------
>
>                 Key: HIVE-259
>                 URL: https://issues.apache.org/jira/browse/HIVE-259
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Venky Iyer
>
> Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message