impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-2328: Read support for min/max Parquet statistics
Date Fri, 24 Feb 2017 01:58:18 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-2328: Read support for min/max Parquet statistics
......................................................................


Patch Set 11:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6032/11/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 194:   // Allocate tuple buffer to evaluate conjuncts on parquet::Statistics.
Sorry to keep asking you about more JIRAs :), but you could file a JIRA to evaluate min/max
aggregates against Parquet stats?


Line 196:   if (min_max_tuple_desc) {
!= nullptr?


Line 496:     int col_idx = slot_desc->col_pos() - scan_node_->num_partition_keys();
I think we need to handle the name-based field solution policy here as well.

SET PARQUET_FALLBACK_SCHEMA_RESOLUTION=NAME


Line 497:     DCHECK(col_idx < row_group.columns.size());
I don't think this DCHECK is right. A Parquet file may legitimately have fewer columns than
the table schema.

In these cases we can easily skip the row group because we'd initialize the missing slot values
to NULL.


http://gerrit.cloudera.org:8080/#/c/6032/11/be/src/exec/parquet-metadata-utils.cc
File be/src/exec/parquet-metadata-utils.cc:

Line 223:   DCHECK(col_idx < row_group.columns.size());
I don't think this DCHECK is right. A Parquet file may legitimately not have all columns specified
in the table schema.

Also, in these cases we can easily skip the row group because we'd initialize the missing
slot values to NULL.


http://gerrit.cloudera.org:8080/#/c/6032/11/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

Line 315:     SlotRef slot = new SlotRef(slotDesc);
Would look nicer if the explain plan printed the predicates evaluated against the statistics
with "max(col)" or "min(col)".

You can do this by giving the slotDesc a custom label with setLabel().


Line 340:       if (!(binaryPred.getChild(0) instanceof SlotRef)) continue;
Why not allow cast SlotRefs?


-- 
To view, visit http://gerrit.cloudera.org:8080/6032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I39b836165756fcf929c801048d91c50c8fdcdae4
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Matthew Mulder <mmulder@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message