impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Volker (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5210: Count rows and collection items in parquet scanner separately
Date Wed, 23 Aug 2017 22:31:41 GMT
Lars Volker has posted comments on this change.

Change subject: IMPALA-5210: Count rows and collection items in parquet scanner separately
......................................................................


Patch Set 2:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/7776/2/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 963:   int64_t num_rows_read = 0, num_coll_items_read = 0;
We usually limit declarations to one per line per our style guide.


Line 964:   auto update_read_count = MakeScopeExitTrigger([&] {
I find that this construct makes the code harder to reason about. Now someone reading it has
to keep in mind that additional code will be executed when the function returns. Can you instead
update the counters when CommitRows gets called below? Alternatively you could change the
"return" statement in the while loop to a break; and then update the counters before the final
return.


PS2, Line 978: continue_execution
initialize


Line 1210:     CollectionValueBuilder* coll_value_builder, int64_t* num_coll_item_read) {
Can you remove these from the function signatures (see my comment in the header).


Line 1277:   (*num_coll_item_read) += rows_read;
are the () needed here?


http://gerrit.cloudera.org:8080/#/c/7776/2/be/src/exec/hdfs-parquet-scanner.h
File be/src/exec/hdfs-parquet-scanner.h:

Line 475:   RuntimeProfile::Counter* num_dict_filtered_row_groups_counter_;
Can you do the counting using a member in the scanner instead of passing it through all ReadValue*()
functions? That seems to keep these function signatures more clean.


http://gerrit.cloudera.org:8080/#/c/7776/2/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

PS2, Line 242: /*num_coll_item_read*/
We usually still name parameters the same, even if some methods don't use them (e.g. the pool
parameter in some of these methods).


http://gerrit.cloudera.org:8080/#/c/7776/2/be/src/exec/parquet-column-readers.h
File be/src/exec/parquet-column-readers.h:

PS2, Line 505: 
removing the virtual keyword breaks consistency with the other methods in this class.


PS2, Line 546: 4
?


-- 
To view, visit http://gerrit.cloudera.org:8080/7776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7f6efddaea18507482940f5bdab7326b6482b067
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <twang@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message