Mailing-List: contact issues-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Date: Mon, 1 May 2017 18:28:04 +0000 (UTC)
From: "Ashutosh Chauhan (JIRA)" <jira@apache.org>
To: issues@hive.apache.org
Message-ID: <JIRA.13066211.1492989038000.93431.1493663284428@Atlassian.JIRA>
In-Reply-To: <JIRA.13066211.1492989038000@Atlassian.JIRA>
References: <JIRA.13066211.1492989038000@Atlassian.JIRA> <JIRA.13066211.1492989038413@jira-lw-us.apache.org>
Subject: [jira] [Commented] (HIVE-16513) width_bucket issues
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Mon, 01 May 2017 18:28:08 -0000


    [ https://issues.apache.org/jira/browse/HIVE-16513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15991253#comment-15991253 ] 

Ashutosh Chauhan commented on HIVE-16513:
-----------------------------------------

Determination of Types  should be done in initialize() not in evaluate(). This will hurt performance and is wasteful. UDFs are allowed to assume that types determined in initialize don't change on per row basis. 

> width_bucket issues
> -------------------
>
>                 Key: HIVE-16513
>                 URL: https://issues.apache.org/jira/browse/HIVE-16513
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Carter Shanklin
>            Assignee: Sahil Takiar
>         Attachments: HIVE-16513.1.patch, HIVE-16513.2.patch
>
>
> width_bucket was recently added with HIVE-15982. This ticket notes a few issues.
> Usability issue:
> Currently only accepts integral numeric types. Decimals, floats and doubles are not supported.
> Runtime failures: This query will cause a runtime divide-by-zero in the reduce stage.
> select width_bucket(c1, 0, c1*2, 10) from e011_01 group by c1;
> The divide-by-zero seems to trigger any time I use a group-by. Here's another example (that actually requires the group-by):
> select width_bucket(c1, 0, max(c1), 10) from e011_01 group by c1;
> Advanced Usage Issues:
> Suppose you have a table e011_01 as follows:
> create table e011_01 (c1 integer, c2 smallint);
> insert into e011_01 values (1, 1), (2, 2);
> Compile-time problems:
> You cannot use simple case expressions, searched case expressions or grouping sets. These queries fail:
> select width_bucket(5, c2, case c1 when 1 then c1 * 2 else c1 * 3 end, 10) from e011_01;
> select width_bucket(5, c2, case when c1 < 2 then c1 * 2 else c1 * 3 end, 10) from e011_01;
> select width_bucket(5, c2, max(c1)*10, cast(grouping(c1, c2)*20+1 as integer)) from e011_02 group by cube(c1, c2);
> I'll admit the grouping one is pretty contrived but the case ones seem straightforward, valid, and it's strange that they don't work. Similar queries work with other UDFs like sum. Why wouldn't they "just work"? Maybe [~ashutoshc] can lend some perspective on that?
> Interestingly, you can use window functions in width_bucket, example:
> select width_bucket(rank() over (order by c2), 0, 10, 10) from e011_01;
> works just fine. Hopefully we can get to a place where people implementing functions like this don't need to think about value expression support but we don't seem to be there yet.


--
This message was sent by Atlassian JIRA
(v6.3.15#6346)