impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-5955: Use totalSize tblproperty instead of rawDataSize.
Date Thu, 21 Sep 2017 20:53:04 GMT
Hello Bharath Vissapragada, Dimitris Tsirogiannis, 

I'd like you to reexamine a change. Please visit

to look at the new patch set (#2).

Change subject: IMPALA-5955: Use totalSize tblproperty instead of rawDataSize.

IMPALA-5955: Use totalSize tblproperty instead of rawDataSize.

Today, Impala populates the 'rawDataSize' property
during COMPUTE STATS for the purpose of extrapolating
row counts based on file sizes.

After this patch Impala will populate 'totalSize' instead of
'rawDataSize'. The 'rawDataSize' is not populated or used.

Intended meaning/use of tblproperties:
- rawDataSize' is the estimated in-memory size of a table
  (without encoding and compression)
- 'totalSize' represents the on-disk size

Using the fields correctly is important for compatibility
with other users of the HMS such as Hive and SparkSQL.
For example, SparkSQL relies on the 'totalSize' for
join ordering.

- core/hdfs run passed

Change-Id: If7c2c4e1e99b297c849f9f0d18b2bef34ad811c6
M fe/src/main/java/org/apache/impala/catalog/
M fe/src/main/java/org/apache/impala/service/
M fe/src/test/java/org/apache/impala/planner/
M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
4 files changed, 25 insertions(+), 25 deletions(-)

  git pull ssh:// refs/changes/10/8110/2
To view, visit
To unsubscribe, visit

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If7c2c4e1e99b297c849f9f0d18b2bef34ad811c6
Gerrit-Change-Number: 8110
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Behm <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Bharath Vissapragada <>
Gerrit-Reviewer: Dimitris Tsirogiannis <>

  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message