impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcel Kornacker (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics
Date Wed, 22 Feb 2017 03:56:26 GMT
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-2328: Read support for min/max Parquet statistics

Patch Set 6:

File be/src/exec/hdfs-parquet-scanner.h:

Line 382:   /// Min/max statistics contexts, owned by instances of this class.
they are not owned by the class instances themselves, they're allocated in state_->obj_pool()

Line 386:   /// 'min_max_conjunct_ctxs_', which are used when evaluating parquet::Statistics.
somewhat incomprehensible comment.

this is used only for a specific function, and then only to avoid the dynamic allocation overhead.
point this out, right now it seems very mysterious.

Line 476:   Status EvaluateRowGroupStats(const parquet::RowGroup& row_group, bool* skip_row_group);
you're not evaluating stats, you're evaluating predicates on the stats
File be/src/exec/parquet-column-stats.h:

Line 31: enum class StatsField { MIN, MAX };
move into ColumnStatsBase so it's clear what stats this is referring to.

also, add enums for all fields in Statistics.

Line 114:   static bool ReadFromThrift(const parquet::Statistics& thrift_stats, const
StatsField& stats_field,
move function bodies into parquet-column-stats-inl.h
File fe/src/main/java/org/apache/impala/planner/

Line 297:     collectionConjuncts_.clear();
> My reasoning was that any constant expression could be evaluated against th
true, but those should be identical, given that we rewrite constants as literals.
File fe/src/main/java/org/apache/impala/planner/

Line 85:  * conjuncts, that are passed to the backend and will be evaluated against the
"conjuncts that"

Line 154:   private List<Expr> minMaxPlanConjuncts_ = Lists.newArrayList();
"plan conjuncts" doesn't really mean anything. original conjuncts?

Line 302:    * Builds a predicate to evaluate parquet::Statistics by copying 'inputSlot' into
"to evaluate against"

Line 347:       // Only constant exprs can be evaluated against the parquet::Statistics. This
drop "the"

Line 349:       if (!constExpr.isConstant()) continue;
as pointed out before, you shouldn't see non-literal constants here
File testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test:

Line 1047:    runtime filters: RF000 -> c_custkey
> I changed the code to track the original predicates and display them instea
extended is fine.

yes, "parquet statistics" is more concrete.
File testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test:

Line 1: # Test HDFS scan node.
run this file in extended mode so you see the stats predicates.
File testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test:

Line 206: # Test that without expr rewrite and thus without constant folding, constant exprs
what's the point of those tests?

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Lars Volker <>
Gerrit-Reviewer: Marcel Kornacker <>
Gerrit-Reviewer: Matthew Jacobs <>
Gerrit-Reviewer: Matthew Mulder <>
Gerrit-Reviewer: Mostafa Mokhtar <>
Gerrit-HasComments: Yes

View raw message