impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Internal Jenkins (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-3286: Software prefetching for hash table build.
Date Wed, 04 May 2016 10:40:32 GMT
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-3286: Software prefetching for hash table build.
......................................................................


IMPALA-3286: Software prefetching for hash table build.

This change pipelines the code which builds the hash table.
This is based on the idea which Mostafa presented earlier.
Essentially, the pipelined code will first evaluate all the
rows to be inserted, compute their hash values and prefetch
the corresponding hash table buckets before going through
all the rows again to insert them into the hash table. This
change also introduces lazy evaluation of the build side
expression in Equals() to avoid unnecessary build side
expression evaluation for the second time in case the hash
table bucket is empty or the hash doesn't match due to
collision.

With this change, the hash table build time of a self-join
with lineitem reduces by more than half (going from 10.5s to 4.5s).
The overall query time drops from 37.28s to 31.15s (~16% reduction).

select count(*) from lineitem o1, lineitem o2
where o1.l_orderkey = o2.l_orderkey and
o1.l_linenumber = o2.l_linenumber

TPCH(15) also improves by 2.5% overall, with certain queries
improving up to 8%:

+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(15) | parquet / none / none | 14.34   | -2.49%     | 9.36       | -1.65%         |
+----------+-----------------------+---------+------------+------------+----------------+

+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
| Workload | Query    | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)
| Base StdDev(%) | Num Clients | Iters |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
| TPCH(15) | TPCH-Q1  | parquet / none / none | 8.44   | 8.05        |   +4.92%   |   2.89%
  |   1.50%        | 1           | 10    |
| TPCH(15) | TPCH-Q11 | parquet / none / none | 1.85   | 1.76        |   +4.86%   |   3.88%
  |   3.93%        | 1           | 10    |
| TPCH(15) | TPCH-Q2  | parquet / none / none | 2.90   | 2.78        |   +4.41%   |   8.68%
  | * 15.78% *     | 1           | 10    |
| TPCH(15) | TPCH-Q19 | parquet / none / none | 39.46  | 38.53       |   +2.40%   |   2.21%
  |   2.23%        | 1           | 10    |
| TPCH(15) | TPCH-Q16 | parquet / none / none | 1.90   | 1.86        |   +1.81%   |   2.54%
  |   2.74%        | 1           | 10    |
| TPCH(15) | TPCH-Q15 | parquet / none / none | 5.50   | 5.43        |   +1.32%   |   2.62%
  |   3.34%        | 1           | 10    |
| TPCH(15) | TPCH-Q6  | parquet / none / none | 3.03   | 3.01        |   +0.61%   |   3.54%
  |   2.14%        | 1           | 10    |
| TPCH(15) | TPCH-Q17 | parquet / none / none | 31.22  | 31.13       |   +0.29%   |   0.32%
  |   0.49%        | 1           | 10    |
| TPCH(15) | TPCH-Q14 | parquet / none / none | 3.63   | 3.64        |   -0.21%   |   2.22%
  |   2.70%        | 1           | 10    |
| TPCH(15) | TPCH-Q12 | parquet / none / none | 3.88   | 3.89        |   -0.31%   |   1.90%
  |   1.82%        | 1           | 10    |
| TPCH(15) | TPCH-Q7  | parquet / none / none | 26.25  | 26.64       |   -1.50%   |   2.30%
  |   2.40%        | 1           | 10    |
| TPCH(15) | TPCH-Q20 | parquet / none / none | 6.26   | 6.42        |   -2.45%   |   1.44%
  |   1.81%        | 1           | 10    |
| TPCH(15) | TPCH-Q9  | parquet / none / none | 30.56  | 31.43       |   -2.77%   |   0.41%
  |   0.64%        | 1           | 10    |
| TPCH(15) | TPCH-Q13 | parquet / none / none | 13.53  | 13.94       |   -3.00%   |   1.02%
  |   0.50%        | 1           | 10    |
| TPCH(15) | TPCH-Q8  | parquet / none / none | 24.93  | 25.76       |   -3.22%   |   0.95%
  |   1.00%        | 1           | 10    |
| TPCH(15) | TPCH-Q10 | parquet / none / none | 6.58   | 6.89        |   -4.50%   |   1.37%
  |   1.24%        | 1           | 10    |
| TPCH(15) | TPCH-Q18 | parquet / none / none | 31.44  | 33.12       |   -5.05%   |   0.50%
  |   0.66%        | 1           | 10    |
| TPCH(15) | TPCH-Q21 | parquet / none / none | 31.56  | 33.55       |   -5.92%   |   4.31%
  |   5.01%        | 1           | 10    |
| TPCH(15) | TPCH-Q22 | parquet / none / none | 4.17   | 4.44        |   -5.98%   |   0.59%
  |   0.75%        | 1           | 10    |
| TPCH(15) | TPCH-Q5  | parquet / none / none | 14.67  | 15.66       |   -6.34%   |   8.08%
  |   1.13%        | 1           | 10    |
| TPCH(15) | TPCH-Q3  | parquet / none / none | 11.25  | 12.01       |   -6.38%   |   1.17%
  |   0.85%        | 1           | 10    |
| TPCH(15) | TPCH-Q4  | parquet / none / none | 12.38  | 13.49       |   -8.19%   |   1.44%
  |   0.70%        | 1           | 10    |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+

Change-Id: Ib85e7fc162ad25c849b9e716b629e226697cd940
Reviewed-on: http://gerrit.cloudera.org:8080/2896
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
---
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-node-ir.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/runtime/row-batch.h
10 files changed, 190 insertions(+), 121 deletions(-)

Approvals:
  Michael Ho: Looks good to me, approved
  Internal Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/2896
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ib85e7fc162ad25c849b9e716b629e226697cd940
Gerrit-PatchSet: 12
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>

Mime
View raw message