impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5870: Improve explain/profile output for partial sort
Date Fri, 22 Sep 2017 18:08:47 GMT
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8123 )

Change subject: IMPALA-5870: Improve explain/profile output for partial sort
......................................................................


Patch Set 1:

(3 comments)

The profile changes look great to me and make it way less confusing - just had one comment
about variable names then I'm happy with that part of the patch. I'm unsure about the explain
plan changes - I think that needs more thought and discussion. We could potentially decouple
the two parts and get the profile changes in soon.

http://gerrit.cloudera.org:8080/#/c/8123/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8123/1//COMMIT_MSG@14
PS1, Line 14: For EXPLAIN, this patch removes the 'spill-buffer' mem-estimate for
I'm skeptical about this part of the change. It definitely a wart but there are downsides
to changing it.

I think reporting the buffer size is still useful. It seems like this is mainly an issue with
the name diplayed so if we want to change it I'd prefer directly changing the output bypass
in a parameter to the resource profile to controls the display name and otherwise leaving
the logic unchanged. Preaggregations also have the exact same naming inconsistency so I think
whatever is done here should also be applied there.

Another problem is that changing the name makes inconsistent with the query options that control
the buffer sizes - min_spillable_buffer_bytes and default_spillable_buffer_bytes. I think
the naming of the query options is imperfect but a reasonable compromise - it disambiguates
it from the scanner's I/O buffers and most of time it's accurate. In the cases where it's
slightly inaccurate the non-spilling operators are at least variants of spilling operators.
We could maybe make an argument for decoupling the non-spillable and spillable buffer size
query options since the non-spilling sizes don't affect I/O performance. Unclear if it's worth
adding more query options though.

I guess my preference is probably to leave the explain output unchanged and accept that there's
some potential for confusion. I think a lot of the explain_level=2 output is low-level enough
that you need to understand the mechanisms in order to interpret it.


http://gerrit.cloudera.org:8080/#/c/8123/1//COMMIT_MSG@40
PS1, Line 40:     - RunsCreated: 1 (1)
This is much better!


http://gerrit.cloudera.org:8080/#/c/8123/1/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/8123/1/be/src/runtime/sorter.cc@1520
PS1, Line 1520: runs_counter_
"runs_counter_" is a bit inaccurate for spilling sorts since it doesn't include merged runs.
Maybe we should keep the old variable name and just change the string?

"initial_runs_counter_" seems to make sense for both cases, since they are still initial runs
for non-spilling sorts.



-- 
To view, visit http://gerrit.cloudera.org:8080/8123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b
Gerrit-Change-Number: 8123
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Comment-Date: Fri, 22 Sep 2017 18:08:47 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message