hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mayank Lahiri (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1387) Make PERCENTILE work with double data type
Date Thu, 17 Jun 2010 19:03:43 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879894#action_12879894
] 

Mayank Lahiri commented on HIVE-1387:
-------------------------------------

This is what I suggest we do to resolve this issue:

1. Create a new percentile_approx() function that overrides GenericUDAFHistogramNumeric to
approximate a fine-grained histogram with many bins (say 10,000 for example, but I'll run
some experiments), and then use the histogram to estimate the percentile value.

2. Convert the existing simple percentile() UDAF to a generic UDAF. When the input is byte,
short, int, or long, then use the existing code (with some modifications, like converting
the linear scan to a binary search). When the input is float or double, then automatically
use the percentile_approx() function. 

> Make PERCENTILE work with double data type
> ------------------------------------------
>
>                 Key: HIVE-1387
>                 URL: https://issues.apache.org/jira/browse/HIVE-1387
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Vaibhav Aggarwal
>            Assignee: Mayank Lahiri
>         Attachments: patch-1387-1.patch
>
>
> The PERCENTILE UDAF does not work with double datatype.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message