hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zoltan Haindrich (JIRA)" <>
Subject [jira] [Commented] (HIVE-16255) Support percentile_cont / percentile_disc
Date Tue, 26 Mar 2019 14:06:00 GMT


Zoltan Haindrich commented on HIVE-16255:


> Support percentile_cont / percentile_disc
> -----------------------------------------
>                 Key: HIVE-16255
>                 URL:
>             Project: Hive
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Carter Shanklin
>            Assignee: Laszlo Bodor
>            Priority: Major
>         Attachments: HIVE-16255.01.patch, HIVE-16255.02.patch, HIVE-16255.03.patch, HIVE-16255.04.patch,
HIVE-16255.05.patch, HIVE-16255.06.patch
> Way back in HIVE-259, a percentile function was added that provides a subset of the standard
percentile_cont aggregate function.
> The SQL standard provides some additional options and also a percentile_disc aggregate
function with different rules. In the standard you specify an ordering with arbitrary value
expression and the results are drawn from this value expression. This aggregate functions
should be usable as analytic functions as well (i.e. support the over clause). The current
percentile function is able to be used with an over clause.
> The rough outline of how this works is:
> percentile_cont(number) within group (order by expression) [ over(window spec) ]
> percentile_disc(number) within group (order by expression) [ over(window spec) ]
> The value of number should be between 0 and 1. The value expression is evaluated for
each row of the group, nulls are discarded, and the remaining rows are ordered.
> — If PERCENTILE_CONT is specified, by considering the pair of consecutive rows that
are indicated by the argument, treated as a fraction of the total number of rows in the group,
and interpolating the value of the value expression evaluated for these rows.
> — If PERCENTILE_DISC is specified, by treating the group as a window partition of the
CUME_DIST window function, using the specified ordering of the value expression as the window
ordering, and returning the  first value expression whose cumulative distribution value is
greater than or equal to the argument.

This message was sent by Atlassian JIRA

View raw message