Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9EA5F200C77 for ; Mon, 1 May 2017 20:28:08 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 9D4FE160BBD; Mon, 1 May 2017 18:28:08 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id ED6CF160BAE for ; Mon, 1 May 2017 20:28:07 +0200 (CEST) Received: (qmail 33793 invoked by uid 500); 1 May 2017 18:28:07 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 33783 invoked by uid 99); 1 May 2017 18:28:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 May 2017 18:28:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id AF08BC047C for ; Mon, 1 May 2017 18:28:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id L7g2lFD5eIhP for ; Mon, 1 May 2017 18:28:05 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 896005FDAA for ; Mon, 1 May 2017 18:28:05 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 0395CE0D50 for ; Mon, 1 May 2017 18:28:05 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 690BF21DEC for ; Mon, 1 May 2017 18:28:04 +0000 (UTC) Date: Mon, 1 May 2017 18:28:04 +0000 (UTC) From: "Ashutosh Chauhan (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-16513) width_bucket issues MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 01 May 2017 18:28:08 -0000 [ https://issues.apache.org/jira/browse/HIVE-16513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15991253#comment-15991253 ] Ashutosh Chauhan commented on HIVE-16513: ----------------------------------------- Determination of Types should be done in initialize() not in evaluate(). This will hurt performance and is wasteful. UDFs are allowed to assume that types determined in initialize don't change on per row basis. > width_bucket issues > ------------------- > > Key: HIVE-16513 > URL: https://issues.apache.org/jira/browse/HIVE-16513 > Project: Hive > Issue Type: Bug > Reporter: Carter Shanklin > Assignee: Sahil Takiar > Attachments: HIVE-16513.1.patch, HIVE-16513.2.patch > > > width_bucket was recently added with HIVE-15982. This ticket notes a few issues. > Usability issue: > Currently only accepts integral numeric types. Decimals, floats and doubles are not supported. > Runtime failures: This query will cause a runtime divide-by-zero in the reduce stage. > select width_bucket(c1, 0, c1*2, 10) from e011_01 group by c1; > The divide-by-zero seems to trigger any time I use a group-by. Here's another example (that actually requires the group-by): > select width_bucket(c1, 0, max(c1), 10) from e011_01 group by c1; > Advanced Usage Issues: > Suppose you have a table e011_01 as follows: > create table e011_01 (c1 integer, c2 smallint); > insert into e011_01 values (1, 1), (2, 2); > Compile-time problems: > You cannot use simple case expressions, searched case expressions or grouping sets. These queries fail: > select width_bucket(5, c2, case c1 when 1 then c1 * 2 else c1 * 3 end, 10) from e011_01; > select width_bucket(5, c2, case when c1 < 2 then c1 * 2 else c1 * 3 end, 10) from e011_01; > select width_bucket(5, c2, max(c1)*10, cast(grouping(c1, c2)*20+1 as integer)) from e011_02 group by cube(c1, c2); > I'll admit the grouping one is pretty contrived but the case ones seem straightforward, valid, and it's strange that they don't work. Similar queries work with other UDFs like sum. Why wouldn't they "just work"? Maybe [~ashutoshc] can lend some perspective on that? > Interestingly, you can use window functions in width_bucket, example: > select width_bucket(rank() over (order by c2), 0, 10, 10) from e011_01; > works just fine. Hopefully we can get to a place where people implementing functions like this don't need to think about value expression support but we don't seem to be there yet. -- This message was sent by Atlassian JIRA (v6.3.15#6346)