impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Ho (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-3286: Prefetching for PHJ probing.
Date Mon, 16 May 2016 23:52:14 GMT
Hello Tim Armstrong, Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/2959

to look at the new patch set (#13).

Change subject: IMPALA-3286: Prefetching for PHJ probing.
......................................................................

IMPALA-3286: Prefetching for PHJ probing.

This change pipelines the code which probes the hash tables.
This is based on the idea which Mostafa presented earlier.
Essentially, all rows in a row batch will be evaluated and
hashed first before being probed against the hash tables.
Hash table buckets are prefetched as hash values of rows are
computed.

To avoid re-evaluating the rows again during probing (as the rows
have been evaluated once to compute the hash values), hash table
context has been updated to cache the evaluated expression values,
null bits and hash values of some number of rows. Hash table context
provies a new iterator like interface to iterate through the cached
values.

A PREFETCH_MODE query option has also been added to disable prefetching
if necessary. The default mode is 1 which means hash table buckets will
be prefetched. In the future, this mode may be extended to support hash
table buckets' data prefetching too.

Combined with the build side prefetching, a self join of table lineitem
improves by 40% on a single node run on average:

select count(*)
from lineitem o1, lineitem o2
where o1.l_orderkey = o2.l_orderkey and
      o1.l_linenumber = o2.l_linenumber;

Change-Id: Ib42b93d99d09c833571e39d20d58c11ef73f3cc0
---
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-node-ir.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/exec/partitioned-hash-join-node.inline.h
M be/src/exprs/expr-context.cc
M be/src/exprs/expr-context.h
M be/src/runtime/row-batch.h
M be/src/runtime/test-env.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/udf/udf-internal.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Types.thrift
20 files changed, 1,177 insertions(+), 520 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/59/2959/13
-- 
To view, visit http://gerrit.cloudera.org:8080/2959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib42b93d99d09c833571e39d20d58c11ef73f3cc0
Gerrit-PatchSet: 13
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hxu@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>

Mime
View raw message