Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 8C549200C2D for ; Sat, 18 Feb 2017 02:27:21 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 8AC85160B6D; Sat, 18 Feb 2017 01:27:21 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id AF3DF160B57 for ; Sat, 18 Feb 2017 02:27:20 +0100 (CET) Received: (qmail 65368 invoked by uid 500); 18 Feb 2017 01:27:19 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 65355 invoked by uid 99); 18 Feb 2017 01:27:19 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 18 Feb 2017 01:27:19 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 3B2E8C023B for ; Sat, 18 Feb 2017 01:27:19 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.163 X-Spam-Level: * X-Spam-Status: No, score=1.163 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RDNS_DYNAMIC=0.363, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id UKKPAqixtcGE for ; Sat, 18 Feb 2017 01:27:17 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 09E7B5F47D for ; Sat, 18 Feb 2017 01:27:17 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id v1I1RGuY015905; Sat, 18 Feb 2017 01:27:16 GMT Message-Id: <201702180127.v1I1RGuY015905@ip-10-146-233-104.ec2.internal> Date: Sat, 18 Feb 2017 01:27:16 +0000 From: "Zach Amsden (Code Review)" To: impala-cr@cloudera.com, reviews@impala.incubator.apache.org CC: Michael Ho , Dan Hecht Reply-To: zamsden@cloudera.com X-Gerrit-MessageType: newpatchset Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-2020=3A_Add_rounding_for_decimal_casts=0A?= X-Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 X-Gerrit-ChangeURL: X-Gerrit-Commit: e10d798ce2c20d555a7a5ef0076c203a46517056 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.7 archived-at: Sat, 18 Feb 2017 01:27:21 -0000 Zach Amsden has uploaded a new patch set (#15). Change subject: IMPALA-2020: Add rounding for decimal casts ...................................................................... IMPALA-2020: Add rounding for decimal casts This change adds support for DECIMAL_V2 rounding behavior for both DECIMAL to INT and DOUBLE to DECIMAL casts. The round behavior implemented for exact halves is round halves away from zero (e.g (0.5 -> 1) and (-0.5 -> -1)). Testing: Added expr-test and decimal-test test coverage as well as manual testing. I tried to update the expr benchmark to get some kind of results but the benchmark is pretty bit-rotted. It was throwing JNI exceptions. Fixed up the JNI init call, but there is still a lot of work to do to get this back in a runnable state. Even with the hack to get at the RuntimeContext, we end up getting null derefs due to the slot descriptor table not being initialized. I have decided to wait on expanding the python test until the bugs with overflow are fixed, which will make it easier to test sane behavior. [localhost:21000] > select cast(0.59999 AS int); +----------------------+ | cast(0.59999 as int) | +----------------------+ | 0 | +----------------------+ Fetched 1 row(s) in 0.01s [localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1)); +---------------------------------------------+ | cast(cast(0.5999 as float) as decimal(5,1)) | +---------------------------------------------+ | 0.5 | +---------------------------------------------+ Fetched 1 row(s) in 0.01s [localhost:21000] > set decimal_v2=1; DECIMAL_V2 set to 1 [localhost:21000] > select cast(0.59999 AS int); +----------------------+ | cast(0.59999 as int) | +----------------------+ | 1 | +----------------------+ Fetched 1 row(s) in 0.01s [localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1)); +---------------------------------------------+ | cast(cast(0.5999 as float) as decimal(5,1)) | +---------------------------------------------+ | 0.6 | +---------------------------------------------+ Fetched 1 row(s) in 0.01s Note there is no free lunch: [localhost:21000] > set decimal_v2=0; DECIMAL_V2 set to 0 [localhost:21000] > select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem; Query: select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem Query submitted at: 2017-02-18 01:16:49 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=c546f270316b2176:820bb66700000000 +--------------------------------------+ | sum(cast(l_extendedprice as bigint)) | +--------------------------------------+ | 2293784575265 | +--------------------------------------+ Fetched 1 row(s) in 0.76s [localhost:21000] > select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem; Query: select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem Query submitted at: 2017-02-18 01:16:52 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=524bf2693849ce99:be999b0300000000 +--------------------------------------+ | sum(cast(l_extendedprice as bigint)) | +--------------------------------------+ | 2293784575265 | +--------------------------------------+ Fetched 1 row(s) in 0.73s [localhost:21000] > set decimal_v2=1; DECIMAL_V2 set to 1 [localhost:21000] > select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem; Query: select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem Query submitted at: 2017-02-18 01:16:59 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=ca4f8413061576d9:2389ccde00000000 +--------------------------------------+ | sum(cast(l_extendedprice as bigint)) | +--------------------------------------+ | 2293814088985 | +--------------------------------------+ Fetched 1 row(s) in 0.85s [localhost:21000] > select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem; Query: select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem Query submitted at: 2017-02-18 01:17:02 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=4d4b5f9f181306c7:f6726c600000000 +--------------------------------------+ | sum(cast(l_extendedprice as bigint)) | +--------------------------------------+ | 2293814088985 | +--------------------------------------+ Fetched 1 row(s) in 0.96s So we're about 20% slower. The variance is quite a lot so this is not a scientific number, but the trend is maintained. So we have some work to do to get this back. Casting from double seems to be roughly at parity: [localhost:21000] > set decimal_v2=0; DECIMAL_V2 set to 0 [localhost:21000] > select sum(cast(cast(l_extendedprice as double) as decimal(14,2))) from tpch10_parquet.lineitem; Query: select sum(cast(cast(l_extendedprice as double) as decimal(14,2))) from tpch10_parquet.lineitem Query submitted at: 2017-02-18 01:23:52 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=d84e0f69718bb16d:f069dae600000000 +-------------------------------------------------------------+ | sum(cast(cast(l_extendedprice as double) as decimal(14,2))) | +-------------------------------------------------------------+ | 2293813121802.09 | +-------------------------------------------------------------+ Fetched 1 row(s) in 0.83s [localhost:21000] > select sum(cast(cast(l_extendedprice as double) as decimal(14,2))) from tpch10_parquet.lineitem; Query: select sum(cast(cast(l_extendedprice as double) as decimal(14,2))) from tpch10_parquet.lineitem Query submitted at: 2017-02-18 01:23:54 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=f0410a35e981a86b:e6d9207000000000 +-------------------------------------------------------------+ | sum(cast(cast(l_extendedprice as double) as decimal(14,2))) | +-------------------------------------------------------------+ | 2293813121802.09 | +-------------------------------------------------------------+ Fetched 1 row(s) in 0.83s [localhost:21000] > set decimal_v2=1; DECIMAL_V2 set to 1 [localhost:21000] > select sum(cast(cast(l_extendedprice as double) as decimal(14,2))) from tpch10_parquet.lineitem; Query: select sum(cast(cast(l_extendedprice as double) as decimal(14,2))) from tpch10_parquet.lineitem Query submitted at: 2017-02-18 01:24:02 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=5849852a17314252:73e2433f00000000 +-------------------------------------------------------------+ | sum(cast(cast(l_extendedprice as double) as decimal(14,2))) | +-------------------------------------------------------------+ | 2293813156773.36 | +-------------------------------------------------------------+ Fetched 1 row(s) in 0.86s [localhost:21000] > select sum(cast(cast(l_extendedprice as double) as decimal(14,2))) from tpch10_parquet.lineitem; Query: select sum(cast(cast(l_extendedprice as double) as decimal(14,2))) from tpch10_parquet.lineitem Query submitted at: 2017-02-18 01:24:04 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=b842bc8eaaf9e85d:70616a6b00000000 +-------------------------------------------------------------+ | sum(cast(cast(l_extendedprice as double) as decimal(14,2))) | +-------------------------------------------------------------+ | 2293813156773.36 | +-------------------------------------------------------------+ Fetched 1 row(s) in 0.86s Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 --- M be/src/benchmarks/expr-benchmark.cc M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/expr-test.cc M be/src/exprs/expr.h M be/src/exprs/literal.cc M be/src/runtime/decimal-test.cc M be/src/runtime/decimal-value.h M be/src/runtime/decimal-value.inline.h M be/src/udf/udf.h 9 files changed, 467 insertions(+), 111 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/15 -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 15 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden