impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dimitris Tsirogiannis (Code Review)" <>
Date Tue, 26 Sep 2017 23:48:52 GMT
Dimitris Tsirogiannis has posted comments on this change. (


Patch Set 1:


First pass. High level approach seems reasonable to me. Will take a look at the tests next.
File be/src/exec/
PS1, Line 218: // The first column is the COUNT(*) expr of the original query.
Can you put a similar comment above L206. I had a hard time figuring out what rows[0].colVals[0]
represented in L208.
File fe/src/main/java/org/apache/impala/analysis/
PS1, Line 58: Existing partition-level row counts are not modified.
Mention what happens to the partitions that don't have the row count set.
PS1, Line 60: unmodified
PS1, Line 61: extrapolated
Maybe add some details about how extrapolation is computed. I haven't looked at the entire
patch so you may mention it elsewhere. If so, you can just point to the file that provides
that description.
PS1, Line 62:  so as not to confuse other engines like Hive/SparkSQL which may rely on
            :  *   the shared HMS fields being accurate.
That part may be confusing to someone with no background knowledge. For instance, the fact
that other engines require "accurate" statistics which we update using sampling and extrapolation
:), kind of oxymoron. I think it is ok to just say that we store the extrapolated stats and
remove that part. Thoughts?
PS1, Line 64: Independently, row-count extrapolation is also done during planning based on
            :  *   numRows / totalSize ratio because the table contents may have changed since
            :  *   last time COMPUTE STATS was run.
Remove and put it near the relevant code (planning).
PS1, Line 77: Not supported for now to keep the logic/code simple.
Ha, this is the opposite in the COMPUTE STATS case.
PS1, Line 106: totalFileBytes_ 
Can't you use the HdfsTable.totalFileBytes_ instead?
PS1, Line 108: Only set when
             :   // TABLESAMPLE was specified. Set to -1 for non-HDFS tables
Maybe "Set to -1 for non-HDFS tables or when TABLESAMPLE is not specified"?
PS1, Line 305: expectAllPartitions_ = !updateTableStatsOnly()
So if I do not update table stats only I should expect to compute stats on all partitions?
Essentially, is isIncremental check blended in this function? I think it may make sense to
extract that check.
PS1, Line 344: expectAllPartitions_ = false;
Why is this needed? Isn't L305 sufficient?
PS1, Line 440: // Only add group by clause for HdfsTables.
Comment doesn't seem relevant to the code that follows.
PS1, Line 442: !updateTableStatsOnly()
PS1, Line 452: !updateTableStatsOnly()
PS1, Line 506: sets 'expectAllPartitions_' to false
Hm, I don't see that in the code.
PS1, Line 511: base
File fe/src/main/java/org/apache/impala/catalog/
PS1, Line 34: import org.apache.impala.service.BackendConfig;
I don't think this is needed.
File fe/src/main/java/org/apache/impala/catalog/
PS1, Line 1877:   public Map<Long, List<FileDescriptor>> getFilesSample(Collection<HdfsPartition>
nit: long line
File fe/src/main/java/org/apache/impala/planner/
PS1, Line 184: partition
nit: partitions
PS1, Line 185: numPartitionsWithNumRows
nit: alternatively numPartitionsWithRowCount_
File fe/src/main/java/org/apache/impala/service/
PS1, Line 691: trace
I wonder if trace is the right logging level. Maybe info? It seems kind of important to log
this and don't think it will result in too many log messages. Thoughts?
PS1, Line 699: try {
Do we need the nested try here?
PS1, Line 735: numUpdatedColumns.setRef(Long.valueOf(0));
             :     if (colStats != null) {
             :       numUpdatedColumns.setRef(Long.valueOf(colStats.getStatsObjSize()));
             :     }
Move after L705?
PS1, Line 857: sampleFileBytes
Could that be 0?
PS1, Line 862: Long.MAX_VALUE
Hm, that's a really big number :) Are you sure about that?
PS1, Line 866:   private static ColumnStatisticsData createHiveColStatsData(TAlterTableUpdateStatsParams
nit: long line

To view, visit
To unsubscribe, visit

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f3e72471ac563adada4a4156033a85852b7c8b7
Gerrit-Change-Number: 8136
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm <>
Gerrit-Reviewer: Dimitris Tsirogiannis <>
Gerrit-Comment-Date: Tue, 26 Sep 2017 23:48:52 +0000
Gerrit-HasComments: Yes

  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message