impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dimitris Tsirogiannis (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-2373: Extrapolate row counts for HDFS tables.
Date Wed, 17 May 2017 18:10:25 GMT
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-2373: Extrapolate row counts for HDFS tables.
......................................................................


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6840/2/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

PS2, Line 77: TTableStats tableStats_
nit: a bit safer to initialize this here and set now rows to -1.


PS2, Line 196: /**
             :    * Returns the value of the ROW_COUNT constant, or -1 if not found.
             :    */
Remove, not needed.


http://gerrit.cloudera.org:8080/#/c/6840/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

PS2, Line 128: Input cardinality based on the partition row counts and based on extrapolation
             :   // using the table-level row count and raw data size. -1 if invalid.
Remove the part that explains where this is derived from; doesn't really help. Simply state
what these two variables represent.


PS2, Line 664: Chose
Choose (typo)


PS2, Line 664:  Chose between the extrapolated row count and the one based on stored stats.
I think that requires some further explanation. Something along the lines of "if table stats
(row count, bytes) are available, use that information to estimate cardinality using extrapolation,
otherwise use the exact row counts from table/partition stats". Also, I would put L664-705
in a separate function responsible solely for estimating and setting cardinality, but feel
free to ignore this if you don't like it.


PS2, Line 672: if (!partitions_.isEmpty() && numPartitionsMissingStats_ == partitions_.size()
             :         && extrapolatedNumRows_ == -1) {
             :       // if none of the partitions knew its number of rows, and extrapolation
was
             :       // not possible, we fall back on the table stats
             :       cardinality_ = tbl_.getNumRows();
             :     }
I think it might help readability if you put this inside the if block in L666-668. Also, do
we have a test for this case?


http://gerrit.cloudera.org:8080/#/c/6840/2/testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
File testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test:

PS2, Line 268: 61
Is this ok that this changed?


-- 
To view, visit http://gerrit.cloudera.org:8080/6840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message