impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vuk Ercegovac (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4985: use parquet stats of nested types for dynamic pruning
Date Tue, 14 Nov 2017 18:07:21 GMT
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8480 )

Change subject: IMPALA-4985: use parquet stats of nested types for dynamic pruning
......................................................................


Patch Set 1:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/8480/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8480/1//COMMIT_MSG@12
PS1, Line 12: A nested value is defined to
            : be on a path of one or more nested types that is rooted at a
            : table column.
> I don't understand what that sentence means. Can you try to clarify the dis
Done


http://gerrit.cloudera.org:8080/#/c/8480/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8480/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@435
PS1, Line 435:   // Checks if slot refers to an array "pos" pseudo-column.
> Can you add a comment explaining why checking for getColumn() == null is no
added a clarifying comment. when I stepped through this code, getColumn was casting the net
too. Yes it caught references to "pos" but it also included all nested types as well, which
is not the intent with this change. If expectations for getColumn are different, then I can
look into why its not behaving as expected.


http://gerrit.cloudera.org:8080/#/c/8480/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@441
PS1, Line 441: isMapStruct
> I think it would be clearer to add a isArrayStruct() method to CollectionSt
Done


http://gerrit.cloudera.org:8080/#/c/8480/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@564
PS1, Line 564:  
> nit: the surrounding code seems to omit this space.
Done


http://gerrit.cloudera.org:8080/#/c/8480/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@567
PS1, Line 567:       for (Expr pred: entry.getValue()) {
             :         if (pred instanceof BinaryPredicate) {
             :           tryComputeBinaryMinMaxPredicate(analyzer, (BinaryPredicate) pred);
             :         } else if (pred instanceof InPredicate) {
             :           tryComputeInListMinMaxPredicate(analyzer, (InPredicate) pred);
             :         }
             :       }
> This looks like a duplication of the above loop. Adding additional predicat
good point, done.


http://gerrit.cloudera.org:8080/#/c/8480/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1109
PS1, Line 1109: slot.getColumn() == null
> Is this another check for a pos slot?
hmm, looks like a proxy for estimating when a scan range will definitely be included. since
nested types are currently not filtered, these columns will be scanned so it makes sense to
increase the scanner estimate. 

for this patch, since we conservatively included possibly filtered scalar columns, we should
treat nested types similarly, so it makes sense to me to leave the current logic as-is.


http://gerrit.cloudera.org:8080/#/c/8480/1/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test
File testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test:

http://gerrit.cloudera.org:8080/#/c/8480/1/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test@141
PS1, Line 141: Basics test
> I'm not sure I understand what Basics means. Can you elaborate? I think we 
yeah, that was confusing so removed it and clarified that these queries are distinguished
from the previous ones since they are reading from the tpch data set.


http://gerrit.cloudera.org:8080/#/c/8480/1/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test:

http://gerrit.cloudera.org:8080/#/c/8480/1/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@460
PS1, Line 460: ====
> Does this remove the trailing newline?
reverted... I had put tests here first so must have changed this file.



-- 
To view, visit http://gerrit.cloudera.org:8080/8480
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0c99e20cb080b504442cd5376ea3e046016158fe
Gerrit-Change-Number: 8480
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac <vercegovac@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercegovac@cloudera.com>
Gerrit-Comment-Date: Tue, 14 Nov 2017 18:07:21 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message