Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 5B45C200D20 for ; Tue, 3 Oct 2017 04:54:38 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 5996D160BCB; Tue, 3 Oct 2017 02:54:38 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 790CD1609EF for ; Tue, 3 Oct 2017 04:54:37 +0200 (CEST) Received: (qmail 6038 invoked by uid 500); 3 Oct 2017 02:54:36 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 6027 invoked by uid 99); 3 Oct 2017 02:54:35 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Oct 2017 02:54:35 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id D87231A5BF3 for ; Tue, 3 Oct 2017 02:54:34 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.363 X-Spam-Level: ** X-Spam-Status: No, score=2.363 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, RDNS_DYNAMIC=0.363, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id sryZ3aI4l1WI for ; Tue, 3 Oct 2017 02:54:31 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 0E5F05F3FF for ; Tue, 3 Oct 2017 02:54:30 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id v932sSUZ026244; Tue, 3 Oct 2017 02:54:28 GMT Message-Id: <201710030254.v932sSUZ026244@ip-10-146-233-104.ec2.internal> X-Gerrit-PatchSet: 8 Date: Tue, 3 Oct 2017 02:54:28 +0000 From: "Taras Bobrovytsky (Code Review)" To: impala-cr@cloudera.com, reviews@impala.incubator.apache.org X-Gerrit-MessageType: newpatchset Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-4939=2C_IMPALA-4940=3A_Decimal_V2_multiplication=0A?= X-Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 X-Gerrit-Change-Number: 7438 X-Gerrit-ChangeURL: X-Gerrit-Commit: 4bd8cdaf0aabc3f1e9c0d2ee58d0f2221628eac0 In-Reply-To: References: Reply-To: tbobrovytsky@cloudera.com, impala-cr@cloudera.com, marcelk@gmail.com, reviews@impala.incubator.apache.org MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.14.2 Content-Type: multipart/alternative; boundary="2LTbN/5Po/s="; charset=UTF-8 archived-at: Tue, 03 Oct 2017 02:54:38 -0000 --2LTbN/5Po/s= Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Taras Bobrovytsky has uploaded a new patch set (#8)=2E ( http://gerrit=2Ecl= oudera=2Eorg:8080/7438 ) Change subject: IMPALA-4939, IMPALA-4940: Decimal= V2 multiplication =2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E= =2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E= =2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E= =2E=2E IMPALA-4939, IMPALA-4940: Decimal V2 multiplication Implement the = new DECIMAL return type rules for multiply expressions, active when query o= ption DECIMAL_V2=3D1=2E The algorithm for determining the type of the resul= t of multiplication is described in the JIRA=2E DECIMAL V1: +------------= -----------------------------------------------------------+ | typeof(cast(= '0=2E1' as decimal(38,38)) * cast('0=2E1' as decimal(38,38))) | +----------= -------------------------------------------------------------+ | DECIMAL(38= ,38) | +------------= -----------------------------------------------------------+ +------------= -----------------------------------------------------------+ | typeof(cast(= '0=2E1' as decimal(38,15)) * cast('0=2E1' as decimal(38,15))) | +----------= -------------------------------------------------------------+ | DECIMAL(38= ,30) | +------------= -----------------------------------------------------------+ DECIMAL V2: = +-----------------------------------------------------------------------+ |= typeof(cast('0=2E1' as decimal(38,38)) * cast('0=2E1' as decimal(38,38))) = | +-----------------------------------------------------------------------+= | DECIMAL(38,37) | = +-----------------------------------------------------------------------+ = +-----------------------------------------------------------------------+ |= typeof(cast('0=2E1' as decimal(38,15)) * cast('0=2E1' as decimal(38,15))) = | +-----------------------------------------------------------------------+= | DECIMAL(38,6) | = +-----------------------------------------------------------------------+ = In this patch, we also fix the early multiplication overflow=2E We compute = a 256 bit integer intermediate value, which we then attempt to scale down a= nd round=2E Performance: I ran TPCH 300 and TPCDS 1000 workloads and the = performance is almost identical=2E For TPCH Q1, there was an improvement fr= om 21 seconds to 16 seconds=2E I did not see any regressions=2E The perfor= mance improvement is due to the way we check for overflows after this patch= (by counting the leading zeros instead of dividing)=2E It can be clealy se= en in this query: select cast(2=2E2 as decimal(38, 1)) * cast(2=2E2 as de= cimal(38, 1)) before: 7=2E85s after: 2=2E03s I noticed performance re= gressions in the following cases: - When we need to convert to a 256 bit in= teger before multiplying, which was introduced in this patch=2E Whether t= his happens depends on the resulting precision and the value of the input= s=2E In the following extreme case, the intermediate value is converted t= o a 256 bit integer every time=2E select cast(1=2E1 as decimal(38, 37)= ) * cast(1=2E1 as decimal(38, 37)) before: 14=2E56s (returns null) afte= r: 126=2E17s - When we need to scale down the intermediate value=2E In th= e following query the result is decimal(38,6) after the patch, so the i= ntermediate needs to be scaled down=2E select cast(2=2E2 as decimal(38,1= )) * cast(2=2E2 as decimal(38,19)) before: 7=2E25s after: 13=2E06s Th= ese regressions are possible only when the resulting precision is 38 which = is not common in typical workloads=2E Note: The actual queries that I ran = for the benchmark are not exactly as above=2E I constructed tables with m= illions of rows with those values=2E I ran the queries with DECIMAL_v2=3D= 1 option before and after the patch=2E Change-Id: I37ad6232d7953bd75c18dc8= 6e665b2b501a1ebe1 --- M be/src/exprs/expr-test=2Ecc M be/src/runtime/decima= l-value=2Einline=2Eh M be/src/util/bit-util=2Eh M fe/src/main/java/org/apac= he/impala/analysis/TypesUtil=2Ejava 4 files changed, 332 insertions(+), 59 = deletions(-) git pull ssh://gerrit=2Ecloudera=2Eorg:29418/Impala-ASF re= fs/changes/38/7438/8 -- To view, visit http://gerrit=2Ecloudera=2Eorg:8080= /7438 To unsubscribe, visit http://gerrit=2Ecloudera=2Eorg:8080/settings G= errit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatc= hset Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-Cha= nge-Number: 7438 Gerrit-PatchSet: 8 Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Dan Hecht G= errit-Reviewer: Jim Apple Gerrit-Reviewer: Mi= chael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden --2LTbN/5Po/s=--