impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Ho (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3989: Display skew warning for poorly formatted Parquet files
Date Mon, 12 Dec 2016 19:56:24 GMT
Michael Ho has posted comments on this change.

Change subject: IMPALA-3989: Display skew warning for poorly formatted Parquet files
......................................................................


Patch Set 6:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/5400/6/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

PS6, Line 318: Return 
Returns true if 'row_group' overlaps with 'split_range'.


PS6, Line 333: (split_start <= row_group_start && split_end >= row_group_end);
Why is the case (split_start >= row_group_start && split_end <= row_group_end)
? Isn't that the case here if a row group spans multiple block ?


PS6, Line 463: (row_group_idx_ == -1)
nit: parenthesis isn't necessary.


PS6, Line 479: skipped all the row groups
Mind adding a minor remark that we won't be in this path if there is at least one non-empty
row group which this scanner can process ?


PS6, Line 499:       if (CheckRowGroupOverlapsSplit(row_group, split_range)) {
             :         // If the row group overlaps the split but the mid-point does not fall
within the
             :         // split, we have a poorly formatted file.
             :         misaligned_row_group_skipped = true;
misaligned_row_group_skipped |= CheckRowGroupOverlapSplit(row_group, split_range);


http://gerrit.cloudera.org:8080/#/c/5400/6/be/src/exec/hdfs-parquet-scanner.h
File be/src/exec/hdfs-parquet-scanner.h:

PS6, Line 446:  Number of scanners
Is it really number of scanners ? This counter can be bumped multiple times per scan range.


-- 
To view, visit http://gerrit.cloudera.org:8080/5400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibf48d978383d73efdade733a892e795ebd53c76a
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message