impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections
Date Fri, 12 Jan 2018 00:29:30 GMT
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8775 )

Change subject: IMPALA-4993: extend dictionary filtering to collections
......................................................................


Patch Set 9:

(9 comments)

Change looks pretty good to me

http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc@778
PS8, Line 778:     }
> no, I intended to omit the collection readers. this partitions all nested s
Makes sense. The function comment incorrectly states they are added to 'non_filtering'.


http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc@809
PS8, Line 809:   if (!state_->query_options().parquet_dictionary_filtering || dict_filter_map_.empty())
{
> yup... I assumed that if its a mix of collection + scalars is ok since that
My point was about whether child readers are included or not. In this code path we only add
the top-level readers. In the other code path we recursively collect all nested readers as
well.

I think we should either be consistent or add a comment explaining that in some cases only
the top-level readers are added.


http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-parquet-scanner.cc@845
PS8, Line 845:   // where some data pages are dictionary-encoded and others are plain
> that's correct, removed. I think I carried that conservative check over fro
Agree. Would be good to add a Preonditions check to the FE SlotDescriptor c'tors.


http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8775/8/be/src/exec/hdfs-scanner.h@222
PS8, Line 222:   typedef std::map<SlotId, std::vector<ScalarExprEvaluator*>>
> I'll comment here on the <tupleid, slotid> vs slotid decision here (covers 
Makes sense.

Would be good to clarify this in TupleDescriptor and/or SlotDescriptor as you suggested.

SlotIds and TupleIds are unique within a query. A SlotId belongs to exactly one TupleId.


http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@548
PS8, Line 548:   private boolean isCollectionNotEmpty(TupleDescriptor desc) {
> its used in a couple of places and I wanted to keep the same checks in both
I'm still in favor of removing. Reasons:

The first Preconditions check only seems useful in the context of this function, and even
so you'll get the expected "false" return value if we probe for null.

The second Preconditions check is not necessary because the type of a tuple is always StructType
(see type_ member in TupleDescriptor).


http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1102
PS8, Line 1102:         // in the same order as the normal conjuncts. Sort the indices so
that the
> Done
Sorry this was my being vague. I meant creating a helper function for adding the dictionary
filtering part of the explain string. It's a good chunk of code that deals with that part.

I think the grouping an be done in that function and doesn't need a separate helper.


http://gerrit.cloudera.org:8080/#/c/8775/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1116
PS8, Line 1116:         for (Integer idx : totalIdxList) {
> didn't want to mutate a class member for the purpose of printing diags.
The perTupleConjuncts already have a new list, so no member is modified would be modified.


http://gerrit.cloudera.org:8080/#/c/8775/9/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8775/9/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@639
PS9, Line 639:     List<TupleId> tupleIds = Lists.newArrayList();
remove, Preconditions check in L646 must trivially be true after L645


http://gerrit.cloudera.org:8080/#/c/8775/8/testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
File testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test:

http://gerrit.cloudera.org:8080/#/c/8775/8/testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test@330
PS8, Line 330:    parquet statistics predicates: c_custkey > 0, o.o_orderkey > 0, l.l_partkey
> 0
> hmm... actually I followed the pattern on L324-5 which separates the predic
that pattern makes more sense to me as well, so the oddity is really an existing issue with
the "parquet statistics predicates"



-- 
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Gerrit-Change-Number: 8775
Gerrit-PatchSet: 9
Gerrit-Owner: Vuk Ercegovac <vercegovac@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercegovac@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Jan 2018 00:29:30 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message