impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcel Kornacker (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics
Date Wed, 22 Feb 2017 03:56:26 GMT
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-2328: Read support for min/max Parquet statistics
......................................................................


Patch Set 6:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/6032/6/be/src/exec/hdfs-parquet-scanner.h
File be/src/exec/hdfs-parquet-scanner.h:

Line 382:   /// Min/max statistics contexts, owned by instances of this class.
they are not owned by the class instances themselves, they're allocated in state_->obj_pool()


Line 386:   /// 'min_max_conjunct_ctxs_', which are used when evaluating parquet::Statistics.
somewhat incomprehensible comment.

this is used only for a specific function, and then only to avoid the dynamic allocation overhead.
point this out, right now it seems very mysterious.


Line 476:   Status EvaluateRowGroupStats(const parquet::RowGroup& row_group, bool* skip_row_group);
you're not evaluating stats, you're evaluating predicates on the stats


http://gerrit.cloudera.org:8080/#/c/6032/6/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 31: enum class StatsField { MIN, MAX };
move into ColumnStatsBase so it's clear what stats this is referring to.

also, add enums for all fields in Statistics.


Line 114:   static bool ReadFromThrift(const parquet::Statistics& thrift_stats, const
StatsField& stats_field,
move function bodies into parquet-column-stats-inl.h


http://gerrit.cloudera.org:8080/#/c/6032/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

Line 297:     collectionConjuncts_.clear();
> My reasoning was that any constant expression could be evaluated against th
true, but those should be identical, given that we rewrite constants as literals.


http://gerrit.cloudera.org:8080/#/c/6032/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

Line 85:  * conjuncts, that are passed to the backend and will be evaluated against the
"conjuncts that"


Line 154:   private List<Expr> minMaxPlanConjuncts_ = Lists.newArrayList();
"plan conjuncts" doesn't really mean anything. original conjuncts?


Line 302:    * Builds a predicate to evaluate parquet::Statistics by copying 'inputSlot' into
"to evaluate against"


Line 347:       // Only constant exprs can be evaluated against the parquet::Statistics. This
drop "the"


Line 349:       if (!constExpr.isConstant()) continue;
as pointed out before, you shouldn't see non-literal constants here


http://gerrit.cloudera.org:8080/#/c/6032/4/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
File testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test:

Line 1047:    runtime filters: RF000 -> c_custkey
> I changed the code to track the original predicates and display them instea
extended is fine.

yes, "parquet statistics" is more concrete.


http://gerrit.cloudera.org:8080/#/c/6032/6/testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
File testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test:

Line 1: # Test HDFS scan node.
run this file in extended mode so you see the stats predicates.


http://gerrit.cloudera.org:8080/#/c/6032/6/testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test:

Line 206: # Test that without expr rewrite and thus without constant folding, constant exprs
still
what's the point of those tests?


-- 
To view, visit http://gerrit.cloudera.org:8080/6032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Matthew Mulder <mmulder@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message