impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vuk Ercegovac (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4985: use parquet stats of nested types for dynamic pruning
Date Mon, 06 Nov 2017 07:31:44 GMT
Vuk Ercegovac has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8480


Change subject: IMPALA-4985: use parquet stats of nested types for dynamic pruning
......................................................................

IMPALA-4985: use parquet stats of nested types for dynamic pruning

Currently, parquet row-groups can be pruned at run-time using
min/max stats when predicates (in, binary) are specified for
column scalar types. This patch extends pruning to nested types
for the same class of predicates. A nested value is defined to
be on a path of one or more nested types that is rooted at a
table column. For example, table T's column x is an array of structs,
one of whose fields is y. Given a predicate T.x.y > 3, the path
rooted at T is T.x.y. So long as all types on the path are required
(must be non-empty), the min-max pruning predicate can be used.

Testing:
- extended nested-types-parquet-stats e2e test cases.

Change-Id: I0c99e20cb080b504442cd5376ea3e046016158fe
---
M be/src/exec/hdfs-parquet-scanner.h
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
4 files changed, 249 insertions(+), 13 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/8480/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8480
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I0c99e20cb080b504442cd5376ea3e046016158fe
Gerrit-Change-Number: 8480
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac <vercegovac@cloudera.com>

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message