impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <>
Subject [Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
Date Fri, 06 Oct 2017 06:43:06 GMT
Alex Behm has posted comments on this change. ( )

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

Patch Set 2:

File docs/shared/impala_common.xml:
PS2, Line 1224:         For a particular table, use either <codeph>COMPUTE STATS</codeph>
PS2, Line 1233:         When you run <codeph>COMPUTE INCREMENTAL STATS</codeph>
on a table for the first time,
I suggest some minor rephrasing to drive home the "don't switch mantra" a little more, see
PS2, Line 1234:         the statistics are computed again from scratch regardless of whether
you previously ran
regardless of whether the table has existing stats.
PS2, Line 1236:         for scanning the entire table when switching from <codeph>COMPUTE
STATS</codeph> to
when running COMPUTE INCREMENTAL STATS for the first time on a given table.

(do not mention switching... not supposed to do that)
PS2, Line 1244:         2 GB, a serious error can occur. If only a limited number of partitions
are actively being
If the aggregate metadata of all tables exceeds 2 GB you may experience service downtime (daemon

("serious error" really isn't clear to me)
PS2, Line 1245:         added or inserted into, you can run <codeph>COMPUTE INCREMENTAL
STATS</codeph> for the active
Sorry my phrasing might have been misleading. By "active" partitions I meant those partitions
that are being queried (i.e. read)... if you query some partitions very infrequently then
there is no point in keeping incremental stats for them.
PS2, Line 1248:         optimizations such as partition pruning.
such as partition pruning or join ordering.
File docs/topics/impala_partitioning.xml:
PS2, Line 624:         subset of partitions rather than the entire table. The incremental
nature makes it suitable for large tables
Need to be careful here because "large tables" could be misinterpreted to mean "tables with
many partitions".

I'd prefer to avoid the word "suitable" and instead use a phrasing that states it enables
updating the stats as partitions are added. Whether incremental stats is "suitable" for anything
is questionable because of the huge memory downside.

I'd agree that incremental stats could be suitable in situations where you have a huge partitioned
table with a small rolling window of "active" partitions, so you only ever need to keep incremental
stats on let's say <100 partitions.
File docs/topics/impala_perf_stats.xml:
PS2, Line 361:           <codeph>COMPUTE STATS</codeph> statement might take hours,
or even days. That situation is where you switch
Rephrase to avoid "switch" since switching is bad

To view, visit
To unsubscribe, visit

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-Change-Number: 7999
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Greg Rahn <>
Gerrit-Reviewer: John Russell <>
Gerrit-Reviewer: Mostafa Mokhtar <>
Gerrit-Reviewer: Silvius Rus <>
Gerrit-Comment-Date: Fri, 06 Oct 2017 06:43:06 +0000
Gerrit-HasComments: Yes

  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message