hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sahil Takiar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-16513) width_bucket issues
Date Mon, 01 May 2017 23:31:04 GMT

    [ https://issues.apache.org/jira/browse/HIVE-16513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15991784#comment-15991784
] 

Sahil Takiar commented on HIVE-16513:
-------------------------------------

[~ashutoshc], addressed comments. Test failures are flaky.

> width_bucket issues
> -------------------
>
>                 Key: HIVE-16513
>                 URL: https://issues.apache.org/jira/browse/HIVE-16513
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Carter Shanklin
>            Assignee: Sahil Takiar
>         Attachments: HIVE-16513.1.patch, HIVE-16513.2.patch, HIVE-16513.3.patch
>
>
> width_bucket was recently added with HIVE-15982. This ticket notes a few issues.
> Usability issue:
> Currently only accepts integral numeric types. Decimals, floats and doubles are not supported.
> Runtime failures: This query will cause a runtime divide-by-zero in the reduce stage.
> select width_bucket(c1, 0, c1*2, 10) from e011_01 group by c1;
> The divide-by-zero seems to trigger any time I use a group-by. Here's another example
(that actually requires the group-by):
> select width_bucket(c1, 0, max(c1), 10) from e011_01 group by c1;
> Advanced Usage Issues:
> Suppose you have a table e011_01 as follows:
> create table e011_01 (c1 integer, c2 smallint);
> insert into e011_01 values (1, 1), (2, 2);
> Compile-time problems:
> You cannot use simple case expressions, searched case expressions or grouping sets. These
queries fail:
> select width_bucket(5, c2, case c1 when 1 then c1 * 2 else c1 * 3 end, 10) from e011_01;
> select width_bucket(5, c2, case when c1 < 2 then c1 * 2 else c1 * 3 end, 10) from
e011_01;
> select width_bucket(5, c2, max(c1)*10, cast(grouping(c1, c2)*20+1 as integer)) from e011_02
group by cube(c1, c2);
> I'll admit the grouping one is pretty contrived but the case ones seem straightforward,
valid, and it's strange that they don't work. Similar queries work with other UDFs like sum.
Why wouldn't they "just work"? Maybe [~ashutoshc] can lend some perspective on that?
> Interestingly, you can use window functions in width_bucket, example:
> select width_bucket(rank() over (order by c2), 0, 10, 10) from e011_01;
> works just fine. Hopefully we can get to a place where people implementing functions
like this don't need to think about value expression support but we don't seem to be there
yet.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message