impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-3748: estimate minimum buffers in planner
Date Thu, 09 Mar 2017 23:18:16 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3748: estimate minimum buffers in planner

Patch Set 8:

File fe/src/main/java/org/apache/impala/planner/

Line 48:   public class ResourceRequirement {
> these aren't requirements if they include the existing "estimates". Resourc

Line 115:       node.computeCosts(queryOptions);
> they're not costs
renamed to computeResourceConsumption(). I don't feel strongly about the name - please let
me know if you have a better idea.

Line 156:             hdfsScanMemEstimate + hbaseScanMemEstimate + nonScanMemEstimate + dataSinkMemEstimate);
> long line
File fe/src/main/java/org/apache/impala/planner/

Line 209:   public int getNumInstancesPerHost(TQueryOptions queryOptions) {
> passing in the dop directly is preferable, because it's easier to see what'

Line 232:   public long getNumDistinctValues(TQueryOptions queryOptions, List<Expr>
exprs) {
> rename to include "per instance" in the name to clarify. you might also wan
File fe/src/main/java/org/apache/impala/planner/

Line 120:   // estimated per-instance memory usage for this plan node in bytes;
> the "estimates" don't really have defined semantics (max? desirable?). we m
I added a comment to point to IMPALA-5013, which tracks that problem. Some of the estimates
(HDFS scanner, join, agg) are reasonable and we might want to use them as a basis for estimating
"ideal" memory once this evolves further.

Reworded to avoid "usage".

Line 124:   // Minimum number of buffers required to execute spilling operator.
> why restrict this to spilling? doesn't every operator have a defined requir
That's true. Currently the requirement is 0 for non-spilling operators but that will change
as IMPALA-4834 progresses.

Updated the comment accordingly to be clearer.

Line 281:       TQueryOptions queryOptions, TExplainLevel detailLevel) {
> what's the role of the query options?
It's needed to get mt_dop. For this particular interface it's at a high-level of abstraction
so seems a little specific to be passing in mt_dop as an int.

Line 317:           PrintUtils.printInstances(" ", fragment_.getNumInstances(queryOptions)));
> why do we want this?
I think it makes it much easier to interpret the plans with mt_dop > 1. Otherwise it's
not immediately obvious what parallelism each node executes with, and particularly how the
per-instance estimates relate to the total per-host number.

You suggested in a different comment that we should move it to plan fragment, so I did that.

Line 322:             PrintUtils.printPerInstanceMinBuffers(" ", perInstanceMinBufferBytes_));
> why make this conditional?
The output would be confusing if it was user-visible because the backend exec nodes often
execute with less memory than them minimum buffers (because of the small buffers optimisation,
etc). It'll be more actionable once we have true reservations.
File testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test:

Line 10: |  hosts=1 instances=1 est-per-instance-mem=0B
> how come the number of hosts changed?
Because it's a single-node plan - I adjusted the approach so that # hosts more accurately
reflects how the planner expects the plan to execute.

It's now based on PlanFragment.getNumNodes() (the # of instances of the  fragment expected)
vs PlanNode.getNumNodes() (an estimate that feeds into the decision how to partition a fragment).

I think this will be even clearer if I move hosts/instances to the fragment level.

The previous approach led to various anomalies, e.g. hosts could be different for PlanNodes
in the same fragment.
File testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test:

Line 15:      hosts=1 instances=1 est-per-instance-mem=0B
> regarding the number of fragment instances: that's really a per-fragment (r
I ended up making this change at explain levels 2 and 3. I've  updated some of affected tests
so you can see how the new output looks. If we agree I can go through and fix the remaining
tests (I don't have a working kudu instance locally so it's a bit labour-intensive to update
some of the tests).
File testdata/workloads/functional-query/queries/QueryTest/explain-level0.test:

Line 3
> why change this?
These tests were disabled for years. With all these complex queries I think they would require
frequent updating for unrelated changes (e.g. planner changes). The new tests are simpler
but should provide reasonable coverage of the output format of "explain".

Line 8: 'Per-Host Resource Estimates: Memory=388.41MB'
> how come no resource requirements?
Display of them is currently disabled (aside from PlannerTests). These tests reflect what
we expect users to see when they run an explain.
File testdata/workloads/functional-query/queries/QueryTest/explain-level3.test:

Line 8: 'Per-Host Resource Estimates: Memory=388.41MB'
> how come this is missing the requirements?
They're deliberately disabled for now

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I1e358182bcf2bc5fe5c73883eb97878735b12d37
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <>
Gerrit-Reviewer: Marcel Kornacker <>
Gerrit-Reviewer: Matthew Jacobs <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message