impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-2373: Extrapolate row counts for HDFS tables.
Date Fri, 19 May 2017 04:44:17 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-2373: Extrapolate row counts for HDFS tables.
......................................................................


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6840/2/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

PS2, Line 77: TTableStats tableStats_
> nit: a bit safer to initialize this here and set now rows to -1.
Created in c'tor. There is no constructor that allows setting both values to -1. And only
setting one seems more dangerous (bytes would be 0).


PS2, Line 196: /**
             :    * Returns the value of the ROW_COUNT constant, or -1 if not found.
             :    */
> Remove, not needed.
Happy to remove, but seems non-obvious that -1 represents "not found"? Sure we should remove?


http://gerrit.cloudera.org:8080/#/c/6840/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

PS2, Line 128: Input cardinality based on the partition row counts and based on extrapolation
             :   // using the table-level row count and raw data size. -1 if invalid.
> Remove the part that explains where this is derived from; doesn't really he
Done


PS2, Line 664: Chose
> Choose (typo)
Done


PS2, Line 664:  Chose between the extrapolated row count and the one based on stored stats.
> I think that requires some further explanation. Something along the lines o
I like the function idea. Moved into separate function. Function comment should address your
concern regarding the comment.


PS2, Line 672: if (!partitions_.isEmpty() && numPartitionsMissingStats_ == partitions_.size()
             :         && extrapolatedNumRows_ == -1) {
             :       // if none of the partitions knew its number of rows, and extrapolation
was
             :       // not possible, we fall back on the table stats
             :       cardinality_ = tbl_.getNumRows();
             :     }
> I think it might help readability if you put this inside the if block in L6
Done. Added tests for this cardinality behavior.


http://gerrit.cloudera.org:8080/#/c/6840/2/testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
File testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test:

PS2, Line 268: 61
> Is this ok that this changed?
This is actually the same value as in master. Our planner test infra has a bug that causes
row-size to incorrectly be ignored in the comparison, so this file was never updated.
I filed IMPALA-5341


-- 
To view, visit http://gerrit.cloudera.org:8080/6840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message