impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vuk Ercegovac (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4985: use parquet stats of nested types for dynamic pruning
Date Tue, 14 Nov 2017 18:07:21 GMT
Vuk Ercegovac has posted comments on this change. ( )

Change subject: IMPALA-4985: use parquet stats of nested types for dynamic pruning

Patch Set 1:

Commit Message:
PS1, Line 12: A nested value is defined to
            : be on a path of one or more nested types that is rooted at a
            : table column.
> I don't understand what that sentence means. Can you try to clarify the dis
File fe/src/main/java/org/apache/impala/planner/
PS1, Line 435:   // Checks if slot refers to an array "pos" pseudo-column.
> Can you add a comment explaining why checking for getColumn() == null is no
added a clarifying comment. when I stepped through this code, getColumn was casting the net
too. Yes it caught references to "pos" but it also included all nested types as well, which
is not the intent with this change. If expectations for getColumn are different, then I can
look into why its not behaving as expected.
PS1, Line 441: isMapStruct
> I think it would be clearer to add a isArrayStruct() method to CollectionSt
PS1, Line 564:  
> nit: the surrounding code seems to omit this space.
PS1, Line 567:       for (Expr pred: entry.getValue()) {
             :         if (pred instanceof BinaryPredicate) {
             :           tryComputeBinaryMinMaxPredicate(analyzer, (BinaryPredicate) pred);
             :         } else if (pred instanceof InPredicate) {
             :           tryComputeInListMinMaxPredicate(analyzer, (InPredicate) pred);
             :         }
             :       }
> This looks like a duplication of the above loop. Adding additional predicat
good point, done.
PS1, Line 1109: slot.getColumn() == null
> Is this another check for a pos slot?
hmm, looks like a proxy for estimating when a scan range will definitely be included. since
nested types are currently not filtered, these columns will be scanned so it makes sense to
increase the scanner estimate. 

for this patch, since we conservatively included possibly filtered scalar columns, we should
treat nested types similarly, so it makes sense to me to leave the current logic as-is.
File testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test:
PS1, Line 141: Basics test
> I'm not sure I understand what Basics means. Can you elaborate? I think we 
yeah, that was confusing so removed it and clarified that these queries are distinguished
from the previous ones since they are reading from the tpch data set.
File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test:
PS1, Line 460: ====
> Does this remove the trailing newline?
reverted... I had put tests here first so must have changed this file.

To view, visit
To unsubscribe, visit

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0c99e20cb080b504442cd5376ea3e046016158fe
Gerrit-Change-Number: 8480
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac <>
Gerrit-Reviewer: Lars Volker <>
Gerrit-Reviewer: Vuk Ercegovac <>
Gerrit-Comment-Date: Tue, 14 Nov 2017 18:07:21 +0000
Gerrit-HasComments: Yes

  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message