impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dimitris Tsirogiannis (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-2373: Extrapolate row counts for HDFS tables.
Date Sat, 13 May 2017 00:21:55 GMT
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-2373: Extrapolate row counts for HDFS tables.
......................................................................


Patch Set 1:

(11 comments)

First pass on main classes. I haven't looked at the tests yet.

http://gerrit.cloudera.org:8080/#/c/6840/1/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

PS1, Line 137: tables
What does this mean? I thought we only extrapolate row count of partitions.


http://gerrit.cloudera.org:8080/#/c/6840/1/be/src/util/backend-gflag-util.cc
File be/src/util/backend-gflag-util.cc:

PS1, Line 36: DECLARE_bool
nit: they seem to be grouped by type


http://gerrit.cloudera.org:8080/#/c/6840/1/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

PS1, Line 140: HDFS
How about S3? Does it make sense to rename the total_hdfs_bytes into total_bytes?


http://gerrit.cloudera.org:8080/#/c/6840/1/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

PS1, Line 493: !is_incremental
nit: blending code in comments :) "is_incremental is false"


http://gerrit.cloudera.org:8080/#/c/6840/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

PS1, Line 587: hdfsTable
nit: inline


http://gerrit.cloudera.org:8080/#/c/6840/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

PS1, Line 185: private long totalHdfsBytes_;
Do we still need this?


PS1, Line 1328: tableStats_.num_rows
getNumRows()?


PS1, Line 1912: getExtrapolatedNumRows(p.getSize())
Why do we need to compute this if we already have p.getNumRows() and is >= 1?


PS1, Line 1958: tableStats_.num_rows
getNumRows()?


http://gerrit.cloudera.org:8080/#/c/6840/1/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

PS1, Line 76: // estimated number of rows in table; -1: unknown.
update comment


http://gerrit.cloudera.org:8080/#/c/6840/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

PS1, Line 616: computeStats(Analyzer analyzer) {
The logic in this function is now very hard to follow. It's not clear when the extrapolation
is used vs the exact row counts. I think the problem is that this function blends the computation
of the cardinality (and other stats) with the logic of using exact vs extrapolated counts.


-- 
To view, visit http://gerrit.cloudera.org:8080/6840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message