impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Ho (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-3286: Software prefetching for hash table build.
Date Mon, 02 May 2016 01:33:45 GMT
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/2896

to look at the new patch set (#3).

Change subject: IMPALA-3286: Software prefetching for hash table build.
......................................................................

IMPALA-3286: Software prefetching for hash table build.

This change pipelines the code which builds the hash table.
This is based on the idea which Mostafa presented earlier.
Essentially, the pipelined code will first evaluate all the
rows to be inserted, compute their hash values and prefetch
the corresponding hash table buckets before going through
all the rows again to insert them into the hash table. This
change also introduces lazy evaluation of the build side
expression in Equals() to avoid unnecessary build side
expression evaluation for the second time in case the hash
table bucket is empty or the hash doesn't match due to
collision.

With this change, the hash table build time of a self-join
with lineitem reduces by more than half (going from 10.5s to 4.5s).
The overall query time drops from 37.28s to 31.15s (~16% reduction).

select count(*) from lineitem o1, lineitem o2
where o1.l_orderkey = o2.l_orderkey and
o1.l_linenumber = o2.l_linenumber

TPCH(15) also improves by 2.5% overall, with certain queries
improving up to 8%:

+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(15) | parquet / none / none | 14.34   | -2.49%     | 9.36       | -1.65%         |
+----------+-----------------------+---------+------------+------------+----------------+

+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
| Workload | Query    | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)
| Base StdDev(%) | Num Clients | Iters |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
| TPCH(15) | TPCH-Q1  | parquet / none / none | 8.44   | 8.05        |   +4.92%   |   2.89%
  |   1.50%        | 1           | 10    |
| TPCH(15) | TPCH-Q11 | parquet / none / none | 1.85   | 1.76        |   +4.86%   |   3.88%
  |   3.93%        | 1           | 10    |
| TPCH(15) | TPCH-Q2  | parquet / none / none | 2.90   | 2.78        |   +4.41%   |   8.68%
  | * 15.78% *     | 1           | 10    |
| TPCH(15) | TPCH-Q19 | parquet / none / none | 39.46  | 38.53       |   +2.40%   |   2.21%
  |   2.23%        | 1           | 10    |
| TPCH(15) | TPCH-Q16 | parquet / none / none | 1.90   | 1.86        |   +1.81%   |   2.54%
  |   2.74%        | 1           | 10    |
| TPCH(15) | TPCH-Q15 | parquet / none / none | 5.50   | 5.43        |   +1.32%   |   2.62%
  |   3.34%        | 1           | 10    |
| TPCH(15) | TPCH-Q6  | parquet / none / none | 3.03   | 3.01        |   +0.61%   |   3.54%
  |   2.14%        | 1           | 10    |
| TPCH(15) | TPCH-Q17 | parquet / none / none | 31.22  | 31.13       |   +0.29%   |   0.32%
  |   0.49%        | 1           | 10    |
| TPCH(15) | TPCH-Q14 | parquet / none / none | 3.63   | 3.64        |   -0.21%   |   2.22%
  |   2.70%        | 1           | 10    |
| TPCH(15) | TPCH-Q12 | parquet / none / none | 3.88   | 3.89        |   -0.31%   |   1.90%
  |   1.82%        | 1           | 10    |
| TPCH(15) | TPCH-Q7  | parquet / none / none | 26.25  | 26.64       |   -1.50%   |   2.30%
  |   2.40%        | 1           | 10    |
| TPCH(15) | TPCH-Q20 | parquet / none / none | 6.26   | 6.42        |   -2.45%   |   1.44%
  |   1.81%        | 1           | 10    |
| TPCH(15) | TPCH-Q9  | parquet / none / none | 30.56  | 31.43       |   -2.77%   |   0.41%
  |   0.64%        | 1           | 10    |
| TPCH(15) | TPCH-Q13 | parquet / none / none | 13.53  | 13.94       |   -3.00%   |   1.02%
  |   0.50%        | 1           | 10    |
| TPCH(15) | TPCH-Q8  | parquet / none / none | 24.93  | 25.76       |   -3.22%   |   0.95%
  |   1.00%        | 1           | 10    |
| TPCH(15) | TPCH-Q10 | parquet / none / none | 6.58   | 6.89        |   -4.50%   |   1.37%
  |   1.24%        | 1           | 10    |
| TPCH(15) | TPCH-Q18 | parquet / none / none | 31.44  | 33.12       |   -5.05%   |   0.50%
  |   0.66%        | 1           | 10    |
| TPCH(15) | TPCH-Q21 | parquet / none / none | 31.56  | 33.55       |   -5.92%   |   4.31%
  |   5.01%        | 1           | 10    |
| TPCH(15) | TPCH-Q22 | parquet / none / none | 4.17   | 4.44        |   -5.98%   |   0.59%
  |   0.75%        | 1           | 10    |
| TPCH(15) | TPCH-Q5  | parquet / none / none | 14.67  | 15.66       |   -6.34%   |   8.08%
  |   1.13%        | 1           | 10    |
| TPCH(15) | TPCH-Q3  | parquet / none / none | 11.25  | 12.01       |   -6.38%   |   1.17%
  |   0.85%        | 1           | 10    |
| TPCH(15) | TPCH-Q4  | parquet / none / none | 12.38  | 13.49       |   -8.19%   |   1.44%
  |   0.70%        | 1           | 10    |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+

Change-Id: Ib85e7fc162ad25c849b9e716b629e226697cd940
---
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-node-ir.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/runtime/row-batch.h
11 files changed, 170 insertions(+), 86 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/96/2896/3
-- 
To view, visit http://gerrit.cloudera.org:8080/2896
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib85e7fc162ad25c849b9e716b629e226697cd940
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>

Mime
View raw message