hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <>
Subject [jira] [Commented] (HIVE-16513) width_bucket issues
Date Thu, 04 May 2017 04:07:04 GMT


Hive QA commented on HIVE-16513:

Here are the results of testing the latest attachment:

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 10649 tests executed
*Failed tests:*
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=143)

Test results:
Console output:
Test logs:

Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed

This message is automatically generated.

ATTACHMENT ID: 12866311 - PreCommit-HIVE-Build

> width_bucket issues
> -------------------
>                 Key: HIVE-16513
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Carter Shanklin
>            Assignee: Sahil Takiar
>         Attachments: HIVE-16513.1.patch, HIVE-16513.2.patch, HIVE-16513.3.patch, HIVE-16513.4.patch
> width_bucket was recently added with HIVE-15982. This ticket notes a few issues.
> Usability issue:
> Currently only accepts integral numeric types. Decimals, floats and doubles are not supported.
> Runtime failures: This query will cause a runtime divide-by-zero in the reduce stage.
> select width_bucket(c1, 0, c1*2, 10) from e011_01 group by c1;
> The divide-by-zero seems to trigger any time I use a group-by. Here's another example
(that actually requires the group-by):
> select width_bucket(c1, 0, max(c1), 10) from e011_01 group by c1;
> Advanced Usage Issues:
> Suppose you have a table e011_01 as follows:
> create table e011_01 (c1 integer, c2 smallint);
> insert into e011_01 values (1, 1), (2, 2);
> Compile-time problems:
> You cannot use simple case expressions, searched case expressions or grouping sets. These
queries fail:
> select width_bucket(5, c2, case c1 when 1 then c1 * 2 else c1 * 3 end, 10) from e011_01;
> select width_bucket(5, c2, case when c1 < 2 then c1 * 2 else c1 * 3 end, 10) from
> select width_bucket(5, c2, max(c1)*10, cast(grouping(c1, c2)*20+1 as integer)) from e011_02
group by cube(c1, c2);
> I'll admit the grouping one is pretty contrived but the case ones seem straightforward,
valid, and it's strange that they don't work. Similar queries work with other UDFs like sum.
Why wouldn't they "just work"? Maybe [~ashutoshc] can lend some perspective on that?
> Interestingly, you can use window functions in width_bucket, example:
> select width_bucket(rank() over (order by c2), 0, 10, 10) from e011_01;
> works just fine. Hopefully we can get to a place where people implementing functions
like this don't need to think about value expression support but we don't seem to be there

This message was sent by Atlassian JIRA

View raw message