hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mayank Lahiri (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1387) Make PERCENTILE work with double data type
Date Tue, 08 Jun 2010 21:14:12 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876809#action_12876809

Mayank Lahiri commented on HIVE-1387:

The current implementation of percentile() seems to buffer the entire input stream as an exact
histogram. This is likely to choke on massive datasets, even more so with doubles instead
of longs where aggregating multiples values as counts can have even less of an effect than
with doubles.

I'm currently working on a constant-space histogram UDAF. Estimating percentiles from such
a histogram might be a better option on larger datasets. Alternatively, we might consider
splitting this functionality into percentile_exact() and percentile_approx() UDAFs, using
this version and the constant-space histogram approximation respectively.

> Make PERCENTILE work with double data type
> ------------------------------------------
>                 Key: HIVE-1387
>                 URL: https://issues.apache.org/jira/browse/HIVE-1387
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Vaibhav Aggarwal
>         Attachments: patch-1387-1.patch
> The PERCENTILE UDAF does not work with double datatype.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message