impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Volker (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections
Date Wed, 13 Dec 2017 20:39:00 GMT
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8775 )

Change subject: IMPALA-4993: extend dictionary filtering to collections
......................................................................


Patch Set 4:

(11 comments)

This looks good to me, I only added minor clean-up comments and will do a final pass on the
next PS.

http://gerrit.cloudera.org:8080/#/c/8775/4/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

http://gerrit.cloudera.org:8080/#/c/8775/4/be/src/exec/hdfs-scan-node-base.h@90
PS4, Line 90:   /// sequence-based file
This looks like a rebase artifact. I often look at diffs between patchsets (and I think others
do, too) so if possible, it helps to do a final rebase at the end instead of rebaseing in
the middle. Sometimes rebases are inevitable, in which case it's fine of course.


http://gerrit.cloudera.org:8080/#/c/8775/4/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8775/4/be/src/exec/hdfs-scanner.cc@90
PS4, Line 90: std::
We usually drop the std:: prefix in cc files.


http://gerrit.cloudera.org:8080/#/c/8775/4/be/src/exec/hdfs-scanner.cc@96
PS4, Line 96: std::
same here.


http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@187
PS1, Line 187:   // tuple. Uses a linked hash map for consistent display in explain.
> min-max pruning is discussed in the header, but dictionary pruning is not. 
Thanks.


http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@624
PS1, Line 624:         addNotEmptyCollections(collectionConjuncts);
> A bit unclear to me as well. However, if I understand this literally, we on
Thanks for adding the comments. This way round it makes more sense. It's still not obvious
to me why in the prior code and before the check for slotIds.size() != 1, tupleIds.size()
== 1 had to be true. There seems to be some guarantee that conjuncts in the scanners work
on a single tuple only, which surprises me.


http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@656
PS1, Line 656:     // Check to see if the conjunct evaluates to true when the slot is NULL
> Done
nit: conjuncts still lacks the _ :)


http://gerrit.cloudera.org:8080/#/c/8775/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8775/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@111
PS4, Line 111: the
Duplicate word


http://gerrit.cloudera.org:8080/#/c/8775/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1099
PS4, Line 1099:         output.append(detailPrefix + "parquet statistics predicates: " +
This one compiles the string differently than the others, probably I did this myself. While
you're here and if you don't mind, you could convert it to String.format().


http://gerrit.cloudera.org:8080/#/c/8775/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1116
PS4, Line 1116:         List<Integer> totalIdxList = Lists.newArrayList();
This is more of a guess, but can't you pass entry.getValue() to newArrayList?


http://gerrit.cloudera.org:8080/#/c/8775/1/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test
File testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test:

http://gerrit.cloudera.org:8080/#/c/8775/1/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test@265
PS1, Line 265: # - number of projections and their nesting depth
> I think there's some benefit to explicitly listing the areas tested/to-test
I understand your point, let's keep it here.


http://gerrit.cloudera.org:8080/#/c/8775/1/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test@275
PS1, Line 275: 1
> there are two files in the table. one of them is dictionary encoded for int
That makes sense. Should we add a quick comment for the next person to come by ("Only one
of the files in this table is dictionary encoded")?



-- 
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Gerrit-Change-Number: 8775
Gerrit-PatchSet: 4
Gerrit-Owner: Vuk Ercegovac <vercegovac@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercegovac@cloudera.com>
Gerrit-Comment-Date: Wed, 13 Dec 2017 20:39:00 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message