impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zach Amsden (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-2020: Add rounding for decimal casts
Date Sat, 18 Feb 2017 01:27:16 GMT
Zach Amsden has uploaded a new patch set (#15).

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................

IMPALA-2020: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  The round behavior
implemented for exact halves is round halves away from zero (e.g
(0.5 -> 1) and (-0.5 -> -1)).

Testing: Added expr-test and decimal-test test coverage as well as
manual testing.  I tried to update the expr benchmark to get some
kind of results but the benchmark is pretty bit-rotted.  It was
throwing JNI exceptions.  Fixed up the JNI init call, but there is
still a lot of work to do to get this back in a runnable state.
Even with the hack to get at the RuntimeContext, we end up getting
null derefs due to the slot descriptor table not being initialized.

I have decided to wait on expanding the python test until the bugs
with overflow are fixed, which will make it easier to test sane
behavior.

[localhost:21000] > select cast(0.59999 AS int);
+----------------------+
| cast(0.59999 as int) |
+----------------------+
| 0                    |
+----------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+---------------------------------------------+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+---------------------------------------------+
| 0.5                                         |
+---------------------------------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > set decimal_v2=1;
DECIMAL_V2 set to 1
[localhost:21000] > select cast(0.59999 AS int);
+----------------------+
| cast(0.59999 as int) |
+----------------------+
| 1                    |
+----------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+---------------------------------------------+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+---------------------------------------------+
| 0.6                                         |
+---------------------------------------------+
Fetched 1 row(s) in 0.01s

Note there is no free lunch:

[localhost:21000] > set decimal_v2=0;
DECIMAL_V2 set to 0
[localhost:21000] > select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem;
Query: select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem
Query submitted at: 2017-02-18 01:16:49 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=c546f270316b2176:820bb66700000000
+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293784575265                        |
+--------------------------------------+
Fetched 1 row(s) in 0.76s
[localhost:21000] > select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem;
Query: select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem
Query submitted at: 2017-02-18 01:16:52 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=524bf2693849ce99:be999b0300000000
+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293784575265                        |
+--------------------------------------+
Fetched 1 row(s) in 0.73s
[localhost:21000] > set decimal_v2=1;
DECIMAL_V2 set to 1
[localhost:21000] > select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem;
Query: select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem
Query submitted at: 2017-02-18 01:16:59 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=ca4f8413061576d9:2389ccde00000000
+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293814088985                        |
+--------------------------------------+
Fetched 1 row(s) in 0.85s
[localhost:21000] > select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem;
Query: select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem
Query submitted at: 2017-02-18 01:17:02 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=4d4b5f9f181306c7:f6726c600000000
+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293814088985                        |
+--------------------------------------+
Fetched 1 row(s) in 0.96s

So we're about 20% slower.  The variance is quite a lot so this is not a
scientific number, but the trend is maintained.  So we have some work to
do to get this back.

Casting from double seems to be roughly at parity:

[localhost:21000] > set decimal_v2=0;
DECIMAL_V2 set to 0
[localhost:21000] > select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem;
Query: select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem
Query submitted at: 2017-02-18 01:23:52 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=d84e0f69718bb16d:f069dae600000000
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813121802.09                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.83s
[localhost:21000] > select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem;
Query: select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem
Query submitted at: 2017-02-18 01:23:54 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=f0410a35e981a86b:e6d9207000000000
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813121802.09                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.83s
[localhost:21000] > set decimal_v2=1;
DECIMAL_V2 set to 1
[localhost:21000] > select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem;
Query: select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem
Query submitted at: 2017-02-18 01:24:02 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=5849852a17314252:73e2433f00000000
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813156773.36                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.86s
[localhost:21000] > select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem;
Query: select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem
Query submitted at: 2017-02-18 01:24:04 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=b842bc8eaaf9e85d:70616a6b00000000
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813156773.36                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.86s

Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/expr.h
M be/src/exprs/literal.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/udf/udf.h
9 files changed, 467 insertions(+), 111 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/15
-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <zamsden@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Zach Amsden <zamsden@cloudera.com>

Mime
View raw message