impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tianyi Wang (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-5210: Count rows and collection items in parquet scanner separately
Date Thu, 24 Aug 2017 23:01:24 GMT
Tianyi Wang has posted comments on this change.

Change subject: IMPALA-5210: Count rows and collection items in parquet scanner separately

Patch Set 3:

File be/src/exec/

Line 964:   while (!column_readers[0]->RowGroupAtEnd()) {
> Yeah, it changes the semantics to something like "Number of successfully co
File be/src/exec/

Line 1008:   COUNTER_ADD(scan_node_->rows_read_counter(), num_rows_read);
> You could use COUNTER_SET here and get rid of num_rows_read altogether.
I'm not sure what value should be COUNTER_SET here. Could you elaborate?

PS3, Line 1009: COUNTER_ADD
> How about removing the initialization at the beginning of the method and us
Same as above.
File be/src/exec/hdfs-parquet-scanner.h:

Line 470:   /// Number of collection items read
> Can you describe the scope of this in the comment, e.g. "Total number of co
Updated. In current code it's total number of collection items read in current row batch.
Do you suggest let it count throughout the lifetime of this scanner and update the scannode
counter when this scanner is cleaned up?
File be/src/exec/scan-node.h:

Line 153:   /// # collection items read from the scanner
> Can you explain here that this is across nested collections, so that A = {B

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I7f6efddaea18507482940f5bdab7326b6482b067
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <>
Gerrit-Reviewer: Lars Volker <>
Gerrit-Reviewer: Tianyi Wang <>
Gerrit-HasComments: Yes

View raw message