impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcel Kornacker (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-2373: Extrapolate row counts for HDFS tables.
Date Sat, 13 May 2017 21:38:25 GMT
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-2373: Extrapolate row counts for HDFS tables.
......................................................................


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6840/1/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

Line 494:   9: optional i64 total_hdfs_bytes
why is this a parameter/an input of compute stats?


http://gerrit.cloudera.org:8080/#/c/6840/1/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 492:     Preconditions.checkState(this instanceof HdfsTable);
why have this function live here and not in hdfstable?


http://gerrit.cloudera.org:8080/#/c/6840/1/testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
File testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test:

Line 114:    stats-rows=7300 extrapolated-rows=7300
to reduce verbosity, print the extrapolated count only when it differs from stats-rows?


http://gerrit.cloudera.org:8080/#/c/6840/1/testdata/workloads/functional-query/queries/QueryTest/alter-table.test
File testdata/workloads/functional-query/queries/QueryTest/alter-table.test:

Line 641: YEAR, MONTH, #ROWS, EXTRAP #ROWS, #FILES, SIZE, BYTES CACHED, CACHE REPLICATION,
FORMAT, INCREMENTAL STATS, LOCATION
extrap is a bit weird, and we don't use abbreviations elsewhere here. spell out?


http://gerrit.cloudera.org:8080/#/c/6840/1/testdata/workloads/functional-query/queries/QueryTest/compute-stats.test
File testdata/workloads/functional-query/queries/QueryTest/compute-stats.test:

Line 20: '2009','1',310,305,1,'24.56KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
> There are a ton of tests that use SHOW TABLE STATS or SHOW PARTITIONS. I ha
what's the reason for the small deviations here, rounding? people might think that something
has gone wrong if the extrapolation numbers are different right after you ran compute stats,
would be nice to avoid that.


-- 
To view, visit http://gerrit.cloudera.org:8080/6840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message