impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sailesh Mukil (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-3804: Push per-split filtering into scanners
Date Mon, 25 Jul 2016 21:30:15 GMT
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-3804: Push per-split filtering into scanners
......................................................................


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/3561/3//COMMIT_MSG
Commit Message:

PS3, Line 16: scanner
scanned


http://gerrit.cloudera.org:8080/#/c/3561/3/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

PS3, Line 127: scan_node->IsZeroSlotTableScan() && footer_split != split
Looks much better now!


PS3, Line 1875: if (ScanRangeIsFilteredOut())
A scan range being filtered out basically translates to "this scan range belongs to a partition
that should be pruned".

Why do this here rather than in IssueInitialRanges()? We have all the information necessary
to do it there, i.e. the partition_id and the filter contexts.

So we can avoid even scheduling the scan ranges that need to be filtered out. Can be done
in L133.


PS3, Line 2095: if (scan_node_->PartitionIsFilteredOut(context_->partition_descriptor()->id(),
              :             FilterStats::ROW_GROUPS_KEY, context_->filter_ctxs())) {
Why do we need to call this again here? Wouldn't the partition have already been filtered
out in L1875 if it was supposed to?


http://gerrit.cloudera.org:8080/#/c/3561/3/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

PS3, Line 426: enum CloseRangePolicy {
             :     CLOSE_ALL,
             :     CLOSE_ONLY_THIS_RANGE,
             :     CLOSE_NONE
             :   };
Still needed?


http://gerrit.cloudera.org:8080/#/c/3561/3/be/src/exec/hdfs-text-scanner.cc
File be/src/exec/hdfs-text-scanner.cc:

PS3, Line 192: 
Good catch removing this unnecessary condition.


PS3, Line 148: ScanRangeIsFilteredOut
Same here. Why issue a scan range if it could be filtered out in IssueInitialRanges()?


-- 
To view, visit http://gerrit.cloudera.org:8080/3561
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9f92178f642695e0e9ef901373a5e9f2878a78ce
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message