impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
Date Fri, 06 Oct 2017 06:43:06 GMT
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7999 )

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
......................................................................


Patch Set 2:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1224
PS2, Line 1224:         For a particular table, use either <codeph>COMPUTE STATS</codeph>
or
Yes!


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1233
PS2, Line 1233:         When you run <codeph>COMPUTE INCREMENTAL STATS</codeph>
on a table for the first time,
I suggest some minor rephrasing to drive home the "don't switch mantra" a little more, see
comments.


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1234
PS2, Line 1234:         the statistics are computed again from scratch regardless of whether
you previously ran
regardless of whether the table has existing stats.


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1236
PS2, Line 1236:         for scanning the entire table when switching from <codeph>COMPUTE
STATS</codeph> to
when running COMPUTE INCREMENTAL STATS for the first time on a given table.

(do not mention switching... not supposed to do that)


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1244
PS2, Line 1244:         2 GB, a serious error can occur. If only a limited number of partitions
are actively being
If the aggregate metadata of all tables exceeds 2 GB you may experience service downtime (daemon
crashes).

("serious error" really isn't clear to me)


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1245
PS2, Line 1245:         added or inserted into, you can run <codeph>COMPUTE INCREMENTAL
STATS</codeph> for the active
Sorry my phrasing might have been misleading. By "active" partitions I meant those partitions
that are being queried (i.e. read)... if you query some partitions very infrequently then
there is no point in keeping incremental stats for them.


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1248
PS2, Line 1248:         optimizations such as partition pruning.
such as partition pruning or join ordering.


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_partitioning.xml
File docs/topics/impala_partitioning.xml:

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_partitioning.xml@624
PS2, Line 624:         subset of partitions rather than the entire table. The incremental
nature makes it suitable for large tables
Need to be careful here because "large tables" could be misinterpreted to mean "tables with
many partitions".

I'd prefer to avoid the word "suitable" and instead use a phrasing that states it enables
updating the stats as partitions are added. Whether incremental stats is "suitable" for anything
is questionable because of the huge memory downside.

I'd agree that incremental stats could be suitable in situations where you have a huge partitioned
table with a small rolling window of "active" partitions, so you only ever need to keep incremental
stats on let's say <100 partitions.


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_perf_stats.xml
File docs/topics/impala_perf_stats.xml:

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_perf_stats.xml@361
PS2, Line 361:           <codeph>COMPUTE STATS</codeph> statement might take hours,
or even days. That situation is where you switch
Rephrase to avoid "switch" since switching is bad



-- 
To view, visit http://gerrit.cloudera.org:8080/7999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-Change-Number: 7999
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell <jrussell@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Greg Rahn <grahn@cloudera.com>
Gerrit-Reviewer: John Russell <jrussell@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-Reviewer: Silvius Rus <srus@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Oct 2017 06:43:06 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message