impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vuk Ercegovac (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4985: use parquet stats of nested types for dynamic pruning
Date Tue, 14 Nov 2017 18:07:53 GMT
Hello Lars Volker, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8480

to look at the new patch set (#2).

Change subject: IMPALA-4985: use parquet stats of nested types for dynamic pruning
......................................................................

IMPALA-4985: use parquet stats of nested types for dynamic pruning

Currently, parquet row-groups can be pruned at run-time using
min/max stats when predicates (in, binary) are specified for
column scalar types. This patch extends pruning to nested types
for the same class of predicates. A nested value is an instance
of a nested type. The nested value itself is a tree of nested
values (e.g., map, array, struct) and scalar values. Predicates
that can be used for row-group pruning must be applied to scalar
nested values. In addition, the scalar value must be on a path
to the root of the nested value where every node on the path
is required. For example, let Table T's column x be an array
of structs, one of whose fields is y. Given a predicate T.x.y > 3,
the path rooted at T is T.x.y. So long as all types on the path
are required (must be non-empty), the min-max pruning predicate
can be used.

Testing:
- extended nested-types-parquet-stats e2e test cases.

Change-Id: I0c99e20cb080b504442cd5376ea3e046016158fe
---
M be/src/exec/hdfs-parquet-scanner.h
M fe/src/main/java/org/apache/impala/analysis/CollectionStructType.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test
4 files changed, 259 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/8480/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8480
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0c99e20cb080b504442cd5376ea3e046016158fe
Gerrit-Change-Number: 8480
Gerrit-PatchSet: 2
Gerrit-Owner: Vuk Ercegovac <vercegovac@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercegovac@cloudera.com>

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message