impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4862: make resource profile consistent with backend behaviour
Date Wed, 28 Jun 2017 22:33:03 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-4862: make resource profile consistent with backend behaviour

Patch Set 7:


The new approach makes sense to me - I Like it.

Focusing on high-level comments first since this is a very meaty patch.

One thing to keep in mind is that somewhat violates your Open()/Close() assumptions,
mostly to satisfy our "guarantee" that row s will be available to fetch immediately after
Open() succeeds. Take a look at:

I think we should get rid of that guarantee, though it needs some thought to decide whether
it's too dangerous for now or not.
File fe/src/main/java/org/apache/impala/planner/

Line 82:   private void recomputeResourceProfiles() {
This seems weird to me and I believe we can potentially get rid of this if we don't we move
computeResourceProfile() out of PlanFragment.finalize(). We'd defer the core computations
to Planner.computeResourceReqs() which gets called on the final plan.
File fe/src/main/java/org/apache/impala/planner/

Line 44
Good riddance! Though I did like having the core logic in a single place.
File fe/src/main/java/org/apache/impala/planner/

Line 6: public interface PlanVisitor {
We already have a Visitor and a corresponding accept() function. Can we consolidate them?
File fe/src/main/java/org/apache/impala/planner/

Line 352:   private static class ResourceAggregator implements PlanVisitor {
Do we really need this separate visitor that only sums over things we have computed in a previous
pass over the plan? Why not have a Visitor that does a pass over the plan to compute the per-fragment
peak resources and then simultaneously sum them up?

Line 392:     planRoots.get(0).walkPlan(aggregator);
The new approach makes sense to me - I like it.

The code/class structure could use some more thought because I found the code and flow very
difficult to follow. In particular, there are now several classes and functions that deal
with estimating resources, but it's not always clear what they abstract and how they are intended
to work together. Here are a few examples that confused me, and I'll post some more throughout
the code in other comments:

PlanTreeResources vs. ResourceProfile:
It's not clear to me what a ResourceProfile really is.
The class comment says "The resources that will be consumed by a set of plan nodes" which
seems to me more fitting for PlanTreeResources (at least based on the name).

At a high-level it feels like your approach needs the following things:
1. Peak resources consumed by a single PlanNode
2. Peak resources consumed by PlanFragment
3. Then we sum the peak resources from all fragments

Visitor and recursion:
I'm a fan of visitors, but the mix of visitors and recursion confuses me. One benefit of the
visitor pattern is that all the logic for a visitor can be encoded in the visitor itself,
but it seems that the core logic is implemented in the various classes by overriding functions.

For example, what do you think of a PeakResourcesVisitor instead of the recursive PlanNode.computePeakResources()?

The intended logic might be captured in a more straightforward way with a FragmentVisitor
and a PlanNodeVisitor. The first goes all over fragments and then uses a PlanNodeVisitor to
compute the per-fragment peak. The FragmentVisitor keeps track of the various sums.

Not sure if you like the suggestions, my main point is the logic/flow is hard to follow today.

Entry point of logic:
Maybe I'm seeing it wrong, but to me it looks like the new code is invoked through PlanFragment.finalize()
which is called way before Planner.computeResourceReqs(). The latter really only walks the
fragments and plan nodes again to compute sums. It seems more intuitive to have Planner.computeResourceReqs()
be the driver and not just an aggregator. The current code flow is not easy to follow, and
it's not clear whether the division of labor between PlanFragment.finalize() and Planner.computeResourceReqs()
even makes sense.
File fe/src/main/java/org/apache/impala/planner/

Line 23:  * The resources that will be consumed by a set of plan nodes.
Does this really represent the resources of a single PlanNode instance or PlanFragment instance?

What are these "set of plan nodes"? Surely not an arbitrary collection of them.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message