Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 758AE200D0A for ; Wed, 4 Oct 2017 11:37:42 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 73F041609E2; Wed, 4 Oct 2017 09:37:42 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 92BE01609D6 for ; Wed, 4 Oct 2017 11:37:41 +0200 (CEST) Received: (qmail 83279 invoked by uid 500); 4 Oct 2017 09:37:40 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 83268 invoked by uid 99); 4 Oct 2017 09:37:40 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Oct 2017 09:37:40 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id B4BEA1A1330 for ; Wed, 4 Oct 2017 09:37:39 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.362 X-Spam-Level: ** X-Spam-Status: No, score=2.362 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 0iYv8Dm5ndE6 for ; Wed, 4 Oct 2017 09:37:37 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 767EF5FD33 for ; Wed, 4 Oct 2017 09:37:37 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id v949baxV022870; Wed, 4 Oct 2017 09:37:36 GMT Message-Id: <201710040937.v949baxV022870@ip-10-146-233-104.ec2.internal> X-Gerrit-PatchSet: 13 Date: Wed, 4 Oct 2017 09:37:36 +0000 From: "Impala Public Jenkins (Code Review)" To: Taras Bobrovytsky , impala-cr@cloudera.com, reviews@impala.incubator.apache.org X-Gerrit-MessageType: merged Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-4939=2C_IMPALA-4940=3A_Decimal_V2_multiplication=0A?= X-Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 X-Gerrit-Change-Number: 7438 X-Gerrit-ChangeURL: X-Gerrit-Commit: 6259641077250dcd3360e2e6c2bf9023b201d858 In-Reply-To: References: Reply-To: impala-cr@cloudera.com, tbobrovytsky@cloudera.com, reviews@impala.incubator.apache.org MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.14.2 Content-Type: multipart/alternative; boundary="2PN+364/m6g="; charset=UTF-8 archived-at: Wed, 04 Oct 2017 09:37:42 -0000 --2PN+364/m6g= Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Impala Public Jenkins has submitted this change and it was merged=2E ( http= ://gerrit=2Ecloudera=2Eorg:8080/7438 ) Change subject: IMPALA-4939, IMPALA= -4940: Decimal V2 multiplication =2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E= =2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E= =2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E= =2E=2E=2E=2E=2E=2E IMPALA-4939, IMPALA-4940: Decimal V2 multiplication Im= plement the new DECIMAL return type rules for multiply expressions, active = when query option DECIMAL_V2=3D1=2E The algorithm for determining the type = of the result of multiplication is described in the JIRA=2E DECIMAL V1: += -----------------------------------------------------------------------+ | = typeof(cast('0=2E1' as decimal(38,38)) * cast('0=2E1' as decimal(38,38))) |= +-----------------------------------------------------------------------+ = | DECIMAL(38,38) | += -----------------------------------------------------------------------+ += -----------------------------------------------------------------------+ | = typeof(cast('0=2E1' as decimal(38,15)) * cast('0=2E1' as decimal(38,15))) |= +-----------------------------------------------------------------------+ = | DECIMAL(38,30) | += -----------------------------------------------------------------------+ D= ECIMAL V2: +--------------------------------------------------------------= ---------+ | typeof(cast('0=2E1' as decimal(38,38)) * cast('0=2E1' as decim= al(38,38))) | +------------------------------------------------------------= -----------+ | DECIMAL(38,37) = | +--------------------------------------------------------------= ---------+ +--------------------------------------------------------------= ---------+ | typeof(cast('0=2E1' as decimal(38,15)) * cast('0=2E1' as decim= al(38,15))) | +------------------------------------------------------------= -----------+ | DECIMAL(38,6) = | +--------------------------------------------------------------= ---------+ In this patch, we also fix the early multiplication overflow=2E= We compute a 256 bit integer intermediate value, which we then attempt to = scale down and round=2E Performance: I ran TPCH 300 and TPCDS 1000 worklo= ads and the performance is almost identical=2E For TPCH Q1, there was an im= provement from 21 seconds to 16 seconds=2E I did not see any regressions=2E= The performance improvement is due to the way we check for overflows afte= r this patch (by counting the leading zeros instead of dividing)=2E It can = be clealy seen in this query: select cast(2=2E2 as decimal(38, 1)) * cast= (2=2E2 as decimal(38, 1)) before: 7=2E85s after: 2=2E03s I noticed pe= rformance regressions in the following cases: - When we need to convert to = a 256 bit integer before multiplying, which was introduced in this patch= =2E Whether this happens depends on the resulting precision and the value= of the inputs=2E In the following extreme case, the intermediate value i= s converted to a 256 bit integer every time=2E select cast(1=2E1 as de= cimal(38, 37)) * cast(1=2E1 as decimal(38, 37)) before: 14=2E56s (returns= null) after: 126=2E17s - When we need to scale down the intermediate v= alue=2E In the following query the result is decimal(38,6) after the patc= h, so the intermediate needs to be scaled down=2E select cast(2=2E2 as= decimal(38,1)) * cast(2=2E2 as decimal(38,19)) before: 7=2E25s after: = 13=2E06s These regressions are possible only when the resulting precision= is 38 which is not common in typical workloads=2E Note: The actual querie= s that I ran for the benchmark are not exactly as above=2E I constructed = tables with millions of rows with those values=2E I ran the queries with = DECIMAL_v2=3D1 option before and after the patch=2E Change-Id: I37ad6232d7= 953bd75c18dc86e665b2b501a1ebe1 Reviewed-on: http://gerrit=2Ecloudera=2Eorg:= 8080/7438 Reviewed-by: Taras Bobrovytsky Test= ed-by: Impala Public Jenkins --- M be/src/exprs/expr-test=2Ecc M be/src/run= time/decimal-value=2Einline=2Eh M be/src/util/bit-util=2Eh M fe/src/main/ja= va/org/apache/impala/analysis/TypesUtil=2Ejava M fe/src/test/java/org/apach= e/impala/analysis/AnalyzeExprsTest=2Ejava 5 files changed, 333 insertions(+= ), 62 deletions(-) Approvals: Taras Bobrovytsky: Looks good to me, appro= ved Impala Public Jenkins: Verified -- To view, visit http://gerrit=2Ec= loudera=2Eorg:8080/7438 To unsubscribe, visit http://gerrit=2Ecloudera=2Eor= g:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Me= ssageType: merged Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1eb= e1 Gerrit-Change-Number: 7438 Gerrit-PatchSet: 13 Gerrit-Owner: Taras Bobro= vytsky Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Ap= ple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky G= errit-Reviewer: Tim Armstrong Gerrit-Reviewer: = Zach Amsden --2PN+364/m6g=--