impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3200: [DOCS] Document user-facing aspects of new buffer pool
Date Wed, 04 Oct 2017 21:23:45 GMT
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8003 )

Change subject: IMPALA-3200: [DOCS] Document user-facing aspects of new buffer pool
......................................................................


Patch Set 4:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_buffer_pool_limit.xml
File docs/topics/impala_buffer_pool_limit.xml:

http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_buffer_pool_limit.xml@39
PS4, Line 39:       Defines the size in bytes of an internal buffer for allocating memory
during queries.
First sentence isn't accurate. Something like this might be better: "Defines a limit on the
amount of memory that the query can allocate from the internal buffer pool."


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_buffer_pool_limit.xml@53
PS4, Line 53: his default is represented by a value
            :       of 0.
The default is actually that the query option is unset. 0 means a limit of 0. I guess that
impala-shell reported that the default was 0, which is incorrect: see IMPALA-5589


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_buffer_pool_limit.xml@69
PS4, Line 69: set buffer_pool_limit=8GB;
It can also be expressed as a percentage of mem_limit (e.g. 80%), which may be more useful
in some cases.


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_default_spillable_buffer_size.xml
File docs/topics/impala_default_spillable_buffer_size.xml:

http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_default_spillable_buffer_size.xml@44
PS4, Line 44:     <p conref="../shared/impala_common.xml#common/type_integer"/>
Do we document somewhere the supported byte suffixes (e.g. B, KB, MB, GB plus variations).


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_default_spillable_buffer_size.xml@49
PS4, Line 49:     <p>
I think it's confusing to say that the default varies, since the query option value doesn't
vary, it just sets an upper bound for the sizes that can be picked by the planner.


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_default_spillable_buffer_size.xml@50
PS4, Line 50:       <codeph>65536</codeph> to <codeph>2097152</codeph>,
depending on the
Feel free to ignore, but might be easier to read if they were human-readable (e.g. 2MB).


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_default_spillable_buffer_size.xml@59
PS4, Line 59:       consider increasing the <codeph>DEFAULT_SPILLABLE_BUFFER_SIZE</codeph>
setting.
Would it be helpful to elaborate here? E.g. "Larger buffer sizes will result in Impala issuing
larger I/O operators to storage devices, which may result in higher throughput, particularly
on rotational disks".


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_default_spillable_buffer_size.xml@61
PS4, Line 61:     </p>
It's implied but would it make sense to state the inverse. I.e. decreasing the option may
reduce memory consumption.

This all depends on whether the planner has actually chosen a spillable-buffer_size that is
capped by this query option (visible when explain_level=2). It seems worth mentioning that
since it's a precondition to this having any effect.


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_max_row_size.xml
File docs/topics/impala_max_row_size.xml:

http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_max_row_size.xml@47
PS4, Line 47: 524288
512K to be human-readable?


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_max_row_size.xml@56
PS4, Line 56: accomodate
sp: accommodate


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_max_row_size.xml@57
PS4, Line 57:       the total bytes stored in the largest row.
It's left ambiguous whether queries processing rows larger than MAX_ROW_SIZE *will* fail or
whether they *may* fail. It's really the latter - in many case they will succeed. I've sometimes
described it as best-effort processing for rows > max_row_size.


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_max_row_size.xml@68
PS4, Line 68: <codeblock><![CDATA[
This example is really helpful! Hopefully this will head off some support cases (I know it
comes up with some frequency).


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_min_spillable_buffer_size.xml
File docs/topics/impala_min_spillable_buffer_size.xml:

http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_min_spillable_buffer_size.xml@38
PS4, Line 38:       <indexterm audience="hidden">MIN_SPILLABLE_BUFFER_SIZE query option</indexterm>
I think most of the comments I made on DEFAULT_SPILLABLE_BUFFER_SIZE have analogues here.


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_scalability.xml
File docs/topics/impala_scalability.xml:

http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_scalability.xml@425
PS4, Line 425: allocated
it's reserved at the beginning of the query, not allocated. The distinction is made in query
profiles etc so we should make sure that it's clear here.


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_scalability.xml@560
PS4, Line 560:         The infrastructure of the spilling feature affects the way the affected
SQL operators, such as
A lot of the discussion in this paragraph and the one below is describing the "small buffers"
part of the old spill-to-disk code, which is no longer relevant. Now all of the memory required
to spill to disk for these operators is reserved upfront and is displayed in the explain plan.
There's no threshold now - operators may expand their memory consumption if there's memory
available but can successfully spill to disk otherwise.


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_scalability.xml@615
PS4, Line 615: >BlockMgr.BytesWritten
This still needs updating - IMPALA-5656 - should I keep an eye out for a different gerrit
review?


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_scalability.xml@701
PS4, Line 701:         <b>Testing performance implications of spilling to disk:</b>
This section is stale too. The scenario still seems valid but the profile output is very different.


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_scalability.xml@745
PS4, Line 745: Do not specify a memory limit lower than about 300 MB, because with such a
             :         low limit, queries could fail to start for other reasons. Now try the
memory-intensive query again.
This is one of the things that was fixed :)



-- 
To view, visit http://gerrit.cloudera.org:8080/8003
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I49323f8ffbff3e195058e88762eedbb1fcb1bc0e
Gerrit-Change-Number: 8003
Gerrit-PatchSet: 4
Gerrit-Owner: John Russell <jrussell@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Oct 2017 21:23:45 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message