Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 99A44200C31 for ; Wed, 22 Feb 2017 04:03:15 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 982A5160B74; Wed, 22 Feb 2017 03:03:15 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id BAD6C160B68 for ; Wed, 22 Feb 2017 04:03:14 +0100 (CET) Received: (qmail 2505 invoked by uid 500); 22 Feb 2017 03:03:14 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 2494 invoked by uid 99); 22 Feb 2017 03:03:13 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Feb 2017 03:03:13 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 38E44C0258 for ; Wed, 22 Feb 2017 03:03:13 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.163 X-Spam-Level: * X-Spam-Status: No, score=1.163 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RDNS_DYNAMIC=0.363, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 6kKMQUOhoh4r for ; Wed, 22 Feb 2017 03:03:11 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 659175F56B for ; Wed, 22 Feb 2017 03:03:11 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id v1M33ANu010147; Wed, 22 Feb 2017 03:03:10 GMT Message-Id: <201702220303.v1M33ANu010147@ip-10-146-233-104.ec2.internal> Date: Wed, 22 Feb 2017 03:03:09 +0000 From: "Zach Amsden (Code Review)" To: impala-cr@cloudera.com, reviews@impala.incubator.apache.org CC: Marcel Kornacker , Michael Ho , Dan Hecht Reply-To: zamsden@cloudera.com X-Gerrit-MessageType: newpatchset Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-2020=2C_4915=2C_4936=3A_Add_rounding_for_decimal_casts=0A?= X-Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 X-Gerrit-ChangeURL: X-Gerrit-Commit: f460e1040e3a4565aa25c6c9d62a738bf2b0bcca In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.7 archived-at: Wed, 22 Feb 2017 03:03:15 -0000 Zach Amsden has uploaded a new patch set (#17). Change subject: IMPALA-2020, 4915, 4936: Add rounding for decimal casts ...................................................................... IMPALA-2020, 4915, 4936: Add rounding for decimal casts This change adds support for DECIMAL_V2 rounding behavior for both DECIMAL to INT and DOUBLE to DECIMAL casts. The round behavior implemented for exact halves is round halves away from zero (e.g (0.5 -> 1) and (-0.5 -> -1)). This change also fixes two open bugs regarding overflow checking. The root cause of both behaviors turned out to be the same - a missing std:: caused the wrong abs() function to be used. Due to details of IEEE floating point representation, this actually masked another bug, as NaN is often represented as all 1-bits, which fails the overflow test. Since the implicit conversion to int lost precision, we ended up storing large numbers that don't actually represent valid decimal numbers in the range when the value happened to be +/- Infinity. This caused the rendering back to ASCII characters to go awry, but is otherwise harmless. Testing: Added expr-test and decimal-test test coverage as well as manual testing. I tried to update the expr benchmark to get some kind of results but the benchmark is pretty bit-rotted. It was throwing JNI exceptions. Fixed up the JNI init call, but there is still a lot of work to do to get this back in a runnable state. Even with the hack to get at the RuntimeContext, we end up getting null derefs due to the slot descriptor table not being initialized. Output comparisons, before | after +----------------------+ | cast(0.59999 as int) | +----------------------+ | 0 | 1 | +----------------------+ +---------------------------------------------+ | cast(cast(0.5999 as float) as decimal(5,1)) | +---------------------------------------------+ | 0.5 | 0.6 | +---------------------------------------------+ Performance summary. In all cases I have tried multiple times and taken the fastest query results. Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3: select sum(cast(l_extendedprice as bigint)) from tpch10_parquet.lineitem; +--------------------------------------+ | sum(cast(l_extendedprice as bigint)) | +--------------------------------------+ | 2293784575265 | +--------------------------------------+ Fetched 1 row(s) in 0.53s With this change, and decimal_v2 off: +--------------------------------------+ | sum(cast(l_extendedprice as bigint)) | +--------------------------------------+ | 2293784575265 | +--------------------------------------+ Fetched 1 row(s) in 0.52s Note that there is some noise / instability in these results and across invocations there is quite a bit of variance. Still we appear not to have regressed. With decimal V2 enabled, we loose some performance due to rounding. DECIMAL_V2 set to 1 +--------------------------------------+ | sum(cast(l_extendedprice as bigint)) | +--------------------------------------+ | 2293814088985 | +--------------------------------------+ Fetched 1 row(s) in 0.63s So we're about 20% slower. The variance is quite a lot so this is not a scientific number, but the trend is maintained. So we have some work to do to get this back. Casting from double seems to be roughly at parity: Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3: +-------------------------------------------------------------+ | sum(cast(cast(l_extendedprice as double) as decimal(14,2))) | +-------------------------------------------------------------+ | 2293813121802.09 | +-------------------------------------------------------------+ Fetched 1 row(s) in 0.63s +--------------------------------------------------------------+ | sum(cast(cast(l_extendedprice as double) as decimal(38,10))) | +--------------------------------------------------------------+ | 2293813156773.3596978911 | +--------------------------------------------------------------+ Fetched 1 row(s) in 0.72s New version, decimal_v2=0 +-------------------------------------------------------------+ | sum(cast(cast(l_extendedprice as double) as decimal(14,2))) | +-------------------------------------------------------------+ | 2293813121802.09 | +-------------------------------------------------------------+ Fetched 1 row(s) in 0.64s +--------------------------------------------------------------+ | sum(cast(cast(l_extendedprice as double) as decimal(38,10))) | +--------------------------------------------------------------+ | 2293813156773.3596978911 | +--------------------------------------------------------------+ Fetched 1 row(s) in 0.73s New version, decimal_v2=1; +-------------------------------------------------------------+ | sum(cast(cast(l_extendedprice as double) as decimal(14,2))) | +-------------------------------------------------------------+ | 2293813156773.36 | +-------------------------------------------------------------+ Fetched 1 row(s) in 0.63s +--------------------------------------------------------------+ | sum(cast(cast(l_extendedprice as double) as decimal(38,10))) | +--------------------------------------------------------------+ | 2293813156773.3600000000 | +--------------------------------------------------------------+ Fetched 1 row(s) in 0.73s Interestingly, you can see the effect of the rounding as well - the decimal 38,10 result is now precise, where as the truncation before left artifacts from the division. Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 --- M be/src/benchmarks/expr-benchmark.cc M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/expr-test.cc M be/src/exprs/expr.h M be/src/exprs/literal.cc M be/src/runtime/decimal-test.cc M be/src/runtime/decimal-value.h M be/src/runtime/decimal-value.inline.h M be/src/udf/udf.h 9 files changed, 467 insertions(+), 110 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/17 -- To view, visit http://gerrit.cloudera.org:8080/5951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810 Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zach Amsden