impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dimitris Tsirogiannis (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls
Date Wed, 03 Jan 2018 00:17:59 GMT
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8523
)

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
......................................................................


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/8523/3/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/8523/3/be/src/scheduling/scheduler.cc@582
PS3, Line 582: ComputeScanRangeAssignment
This function now does both assignment and generation of scan ranges. Also, it feels kind
of weird to have the generation part in the scheduler. I think it may be better to separate
these two concepts and move the scan range generation part out of here.


http://gerrit.cloudera.org:8080/#/c/8523/3/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/8523/3/common/thrift/PlanNodes.thrift@195
PS3, Line 195: struct THdfsFileSplitSpec {
             :   1: required CatalogObjects.THdfsFileDesc file_desc
             :   // -1 means not-splittable
             :   2: required i64 max_block_size
             :   3: required i64 partition_id
             : }
THdfsFileDesc and THdfsFileSplit have quite a bit of intersection wrt the information they
store. I am wondering if it would be simpler to modify THdfsFileSplit to handle both cases
and avoid THdfsFileSplitSpec altogether.


http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@131
PS3, Line 131: new int[] {}
I believe you don't have to do that. You can probably pass null and then avoid L143 and L150
with couple of checks.


http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@338
PS3, Line 338: its
typo: it's


http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@666
PS3, Line 666: private void generateScanRangeSpecs
No intention to be picky but can you move this and tansformBlocksToScanRanges below function
computeScanRangeLocations so that the code reads in a top-down fashion? thanks


http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@694
PS3, Line 694:    * TScanRangeLocationLists are added scanRanges_. If disk ids are expected
in blocks
to



-- 
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 3
Gerrit-Owner: Vuk Ercegovac <vercegovac@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercegovac@cloudera.com>
Gerrit-Comment-Date: Wed, 03 Jan 2018 00:17:59 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message