impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5448: fix invalid number of splits reported in Parquet scan node
Date Thu, 28 Sep 2017 15:17:55 GMT
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8147 )

Change subject: IMPALA-5448: fix invalid number of splits reported in Parquet scan node
......................................................................


Patch Set 1:

(10 comments)

The change makes sense to me. Comments are mainly about style.

http://gerrit.cloudera.org:8080/#/c/8147/1/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

http://gerrit.cloudera.org:8080/#/c/8147/1/be/src/exec/hdfs-scan-node-base.h@497
PS1, Line 497:   /// Mapping of file formats (file type, compression types set) to the number
of
Can you move the comment below the class definition and above the map?


http://gerrit.cloudera.org:8080/#/c/8147/1/be/src/exec/hdfs-scan-node-base.h@499
PS1, Line 499:   struct HdfsCompressionTypesSet {
Can you make this a class and make the member variables private? I don't think there's a reason
we need to expose them.


http://gerrit.cloudera.org:8080/#/c/8147/1/be/src/exec/hdfs-scan-node-base.h@500
PS1, Line 500:     uint32_t bit_map;
Can you add an assertion to the constructor to make sure that bit_map is large enough to hold
all compression types?

Something like: 

  DCHECK_GE(sizeof(bit_map) * CHAR_BIT, _THdfsCompression_VALUES_TO_NAMES.size())


http://gerrit.cloudera.org:8080/#/c/8147/1/be/src/exec/hdfs-scan-node-base.h@501
PS1, Line 501:     THdfsCompression::type last_type;
Is last_type needed? I think we can remove it.


http://gerrit.cloudera.org:8080/#/c/8147/1/be/src/exec/hdfs-scan-node-base.h@504
PS1, Line 504: hasType
We capitalise the first letter in C++ method names, i.e. HasType(). The google C++ guide is
a good reference: https://google.github.io/styleguide/cppguide.html#Function_Names


http://gerrit.cloudera.org:8080/#/c/8147/1/be/src/exec/hdfs-scan-node-base.h@506
PS1, Line 506:     }
Please add blank lines between the method definitions.


http://gerrit.cloudera.org:8080/#/c/8147/1/be/src/exec/hdfs-scan-node-base.h@507
PS1, Line 507: addType
AddType


http://gerrit.cloudera.org:8080/#/c/8147/1/be/src/exec/hdfs-scan-node-base.h@512
PS1, Line 512:     bool operator< (const HdfsCompressionTypesSet& o) const {
Can you comment that this is needed so it can be part of the std::map key.


http://gerrit.cloudera.org:8080/#/c/8147/1/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/8147/1/be/src/exec/hdfs-scan-node-base.cc@897
PS1, Line 897:           for (auto i = compressions_map.begin(); i != compressions_map.end();
++i) {
I think this would be more readable with a ranged for loop. E.g.

for (auto& elem : _THdfsCompression_VALUES_TO_NAMES)


http://gerrit.cloudera.org:8080/#/c/8147/1/testdata/multi_compression_parquet_data/README
File testdata/multi_compression_parquet_data/README:

http://gerrit.cloudera.org:8080/#/c/8147/1/testdata/multi_compression_parquet_data/README@5
PS1, Line 5: These files have two string columns 'a' and 'b'. Each columns using different
compression types.
Cool!



-- 
To view, visit http://gerrit.cloudera.org:8080/8147
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaacc2d775032f5707061e704f12e0a63cde695d1
Gerrit-Change-Number: 8147
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <huangquanlong@gmail.com>
Gerrit-Reviewer: Quanlong Huang <huangquanlong@gmail.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Comment-Date: Thu, 28 Sep 2017 15:17:55 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message