impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zach Amsden (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries
Date Thu, 25 May 2017 00:04:49 GMT
Zach Amsden has uploaded a new patch set (#9).

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
......................................................................

IMPALA-4864 Speed up single slot predicates with dictionaries

When dictionaries are present we can pre-evaluate conjuncts
against the dictionary values and simply look up the result.

Status: ready for review.  Seems to be passing all tests, runs with
ASAN, and gives expected results.  Need to come up with some specific
test cases that exercise this functionality and measure performance.

Basic idea: since we codegen so early, before we know enough details
about the columns to know if they are dict filterable, if we do have
dictionary filtering predicates, we codegen a guard around each
dictionary filterable predicate evaluation.  This guard skips
evaluation of the predicate if it has already been evaluated by the
dictionary.  In this way, we can skip evaluation dynamically for
each row group as we learn which columns are dictionary filterable,
and then push predicate evaluation into the column reader.  Since the
branches will remain 100% predictable over the row group, this should
give us the fastest way to skip over predicate evaluation without
compromising the general case where we may be unable to evaluate
against the dictionary.  We can even do this with codegen turned
off, as a side effect of the way we generate the codegen'd function
when dictionary evaluation is enabled.

If dictionaries aren't available for some predicates, we automatically
fall back to evaluating those predicates in the original order,
preserving selectivity.  The overhead in this case is a perfectly
predictable extra conditional per dictionary predicate.

Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hash-join-node.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/exec/parquet-scratch-tuple-batch.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/runtime/query-state.h
M be/src/runtime/row-batch.h
M be/src/util/bitmap-test.cc
M be/src/util/bitmap.h
M be/src/util/dict-encoding.h
M testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
20 files changed, 564 insertions(+), 180 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/6726/9
-- 
To view, visit http://gerrit.cloudera.org:8080/6726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <zamsden@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonnell@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Zach Amsden <zamsden@cloudera.com>

Mime
View raw message