impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Russell (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
Date Fri, 06 Oct 2017 18:09:52 GMT
John Russell has posted comments on this change. ( http://gerrit.cloudera.org:8080/7999 )

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
......................................................................


Patch Set 2:

(19 comments)

Almost finished with the comments. I'll touch base with Alex to get a little more clarification
about which stats are safe to, or make sense to, DROP INCREMENTAL STATS for.

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1224
PS2, Line 1224:         For a particular table, use either <codeph>COMPUTE STATS</codeph>
or
> Yes!
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1228
PS2, Line 1228: DROP STATS</codeph> and
              :         <codeph>DROP INCREMENTAL STATS</codeph>)
> They are not required if you *exactly* what you are doing, but that does no
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1228
PS2, Line 1228: DROP STATS</codeph> and
              :         <codeph>DROP INCREMENTAL STATS</codeph>)
> are these drops required?
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1234
PS2, Line 1234:         the statistics are computed again from scratch regardless of whether
you previously ran
> regardless of whether the table has existing stats.
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1236
PS2, Line 1236:         for scanning the entire table when switching from <codeph>COMPUTE
STATS</codeph> to
> when running COMPUTE INCREMENTAL STATS for the first time on a given table.
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1243
PS2, Line 1243:         be cached on every <cmdname>impalad</cmdname> host. If
this metadata for a table exceeds
> more specifically, impalads that are also coordinators?
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1243
PS2, Line 1243:         be cached on every <cmdname>impalad</cmdname> host. If
this metadata for a table exceeds
> Yes
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1244
PS2, Line 1244:         2 GB, a serious error can occur. If only a limited number of partitions
are actively being
> If the aggregate metadata of all tables exceeds 2 GB you may experience ser
Done. "Serious error" was my compromise I always used for MySQL, where the open source tradition
leaned towards saying "crash" but the enterprise focus suggested something more euphemistic.


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1247
PS2, Line 1247: does not affect
> Fine with me to expand this to add my earlier explanation of what the "incr
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1247
PS2, Line 1247: does not affect
> does that mean lack of stats has not affect on optimization or something el
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1247
PS2, Line 1247: does not affect
> fair pointer for me, but my comment is about whether this wording is clear 
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1247
PS2, Line 1247: does not affect
> Please see my explanation on what "incremental" stats is in previous patch 
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1248
PS2, Line 1248:         optimizations such as partition pruning.
> such as partition pruning or join ordering.
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1248
PS2, Line 1248:         optimizations such as partition pruning.
> Actually I would remove partition pruning because stats have nothing to do 
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_partitioning.xml
File docs/topics/impala_partitioning.xml:

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_partitioning.xml@611
PS2, Line 611: frequently
> remove
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_partitioning.xml@623
PS2, Line 623: is a shortcut
> I don't know what "shortcut" means here. I'd remove it.
I'm looking for a way to convey that it's faster to do COMPUTE INCREMENTAL STATS on a partitioned
table than COMPUTE STATS. But the time savings only happens if you do C.I.S. multiple times,
that is, because the table keeps getting new partitions.


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_perf_stats.xml
File docs/topics/impala_perf_stats.xml:

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_perf_stats.xml@361
PS2, Line 361:           <codeph>COMPUTE STATS</codeph> statement might take hours,
or even days. That situation is where you switch
> Rephrase to avoid "switch" since switching is bad
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_perf_stats.xml@361
PS2, Line 361: That situation is where you switch
> I'd reword this part ("That situation is where ..."). Suggestion:
I used wording similar to Vuk's suggestion, but without saying "do a CTAS  into a whole new
table and throw away the old table", the user is likely to follow their intuition into switching
from C.S. to C.I.S. on the same table.


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_perf_stats.xml@412
PS2, Line 412: >COMPUTE INCREMENTAL STAT
> docs in impala_common mention "drop stats" before making a switch. that's n
The conref= lines in the <note> above will pull in the same text as in implala_common.xml
with all the extra warnings and instructions.



-- 
To view, visit http://gerrit.cloudera.org:8080/7999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-Change-Number: 7999
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell <jrussell@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Greg Rahn <grahn@cloudera.com>
Gerrit-Reviewer: John Russell <jrussell@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-Reviewer: Silvius Rus <srus@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercegovac@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Oct 2017 18:09:52 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message