impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4862: make resource profile consistent with backend behaviour
Date Thu, 06 Jul 2017 20:36:35 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4862: make resource profile consistent with backend behaviour
......................................................................


Patch Set 12:

(21 comments)

http://gerrit.cloudera.org:8080/#/c/7223/12/be/src/exec/blocking-join-node.cc
File be/src/exec/blocking-join-node.cc:

PS12, Line 192: have_async_build_thread_token_
> do we still need this? i guess the idea is to acquire the thread resource i
Good point. This change is no longer necessary, it was in an earlier iteration of the patch.


http://gerrit.cloudera.org:8080/#/c/7223/12/common/thrift/Frontend.thrift
File common/thrift/Frontend.thrift:

PS12, Line 395:   // Total of minimum memory reservations for a host in bytes. Higher than
              :   // per_host_min_reservation if not all reservations are used concurrently.
> I can't figure out what this means from just reading this comment.  Are the
Tried to reword the comments to be somewhat clearer. Focused on clearly stating what the numbers
are without over-explaining. It doesn't directly relate to fragments or fragment instances
so did not add a reference to that (I think that would be too tangential).

"Higher" was correct but it may be easier explaining why the above number is lower.

Changed to use "buffer reservation" consistently.


http://gerrit.cloudera.org:8080/#/c/7223/12/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

PS12, Line 651: build
> I guess here you mean build as a noun rather than a verb. i.e. this is not 
Removed the comment since I agree the below comment explains the subtle part.


PS12, Line 654: !BackendConfig.INSTANCE.isPartitionedHashJoinEnabled()
> why is that? the old ones don't consume bufferpool reservations.
Right, it's more correct for producing the memory estimates though since it matches what the
backend is actually doing.

I don't think getting this accurate will matter much for the old aggs and joins so I don't
feel strongly about leaving this in or removing it.


http://gerrit.cloudera.org:8080/#/c/7223/12/fe/src/main/java/org/apache/impala/planner/ParallelPlanner.java
File fe/src/main/java/org/apache/impala/planner/ParallelPlanner.java:

PS12, Line 85: joinIds
> where is that used?
It's not, there's a fair amount of dead code in here.


http://gerrit.cloudera.org:8080/#/c/7223/12/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

PS12, Line 108: plan fragment per host
> is that for all instances of this fragment running on a host? or something 
Done


PS12, Line 139: nodes
> isn't that always empty? if so, how about just allocating it here rather th
I was trying to stick with the existing idiom in this code. Might as well simplify it though.


PS12, Line 222: join build sinks
> but what about the resources of the fragment itself?
should be "fragments with join build sinks".


PS12, Line 233: peakResources
> maybe call that fInstanceResources or instanceResources or perFInstanceReso
Done. Also made it more consistent in using the Profile suffix instead of Resources.


http://gerrit.cloudera.org:8080/#/c/7223/12/fe/src/main/java/org/apache/impala/planner/PlanNode.java
File fe/src/main/java/org/apache/impala/planner/PlanNode.java:

PS12, Line 637: between Open()
> is that inclusive of Open()? if so, the name postOpenProfile seems a bit mi
It's not inclusive of the time spent in Open(). Updated the comment to be more precise


PS12, Line 645: computeResourceProfile
> computeNodeResourceProfile
Done


PS12, Line 658: child is open
> Maybe say "until after the child's Open() returns."  It ultimately means th
Done


http://gerrit.cloudera.org:8080/#/c/7223/12/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

Line 62:   // estimates of zero, even if the contained PlanNodes have estimates of zero.
> how is this chosen and why do we need it?
This is just retained from the existing memory estimates - it moved from the deleted PipelinedPlanNodeSet.

If we don't have a minimum we'll return 0 for some queries since not all of the operators
and sinks actually estimate memory. Not the best approach but don't want to open that can
of worms right now.


PS12, Line 353: Peak
> it seems like we're using peak and sum to mean the same thing in this funct
Peak is the peak at any moment in time, sum is just the sum of all reservations. I reworded
the comments and also added a comment to the sum of perHostPeakResources() below to explain
why the peak is computed as the sum of the fragments (which is a detail of the computation,
rather than the meaning of the value produced by the computation).


Line 354:     // Total of per-host minimum reservations across all plan nodes and sinks.
> why is that a meaningful value?
The commit message explains the motivation. It's not used now but is needed to manage the
pool of initial reservations.


PS12, Line 361: now that we've populated
              :       // all the profiles in the execution tree below it.
> where did that happen?
bad comment. Updated.


Line 391:   }
> note to self to look at this function again in the next iteration.
Done


http://gerrit.cloudera.org:8080/#/c/7223/12/fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
File fe/src/main/java/org/apache/impala/planner/ResourceProfile.java:

Line 85:   public ResourceProfile multiply(int factor) {
> comment
Done


http://gerrit.cloudera.org:8080/#/c/7223/12/fe/src/main/java/org/apache/impala/planner/SubplanNode.java
File fe/src/main/java/org/apache/impala/planner/SubplanNode.java:

Line 105:     // therefore the peak resource consumption is simply the sum of all node resources.
> do we have good test coverage of this case?
I have two targeted planner tests in resource-requirements.test with a variety of plan nodes.


http://gerrit.cloudera.org:8080/#/c/7223/12/fe/src/main/java/org/apache/impala/planner/UnionNode.java
File fe/src/main/java/org/apache/impala/planner/UnionNode.java:

PS12, Line 140: Ancestor
> what node is this referring to? ancestor of the union? or the union itself?
Confusing comment. clarified that it's the union.


http://gerrit.cloudera.org:8080/#/c/7223/12/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

Line 1024:     queryCtx.setDisable_spilling(disableSpilling);
> not this change, but do we have good test coverage of this w.r.t. reservati
We don't have any tests using "disable_unsafe_spills", although https://gerrit.cloudera.org/#/c/7219/
adds minimal coverage. 

We do have some tests disabling spilling via scratch_limit and by removing all scratch directories,
which ends up being handled the same code path the backend. I don't think we have coverage
for all operators. I'm going to add some for IMPALA-5570. 

I filed IMPALA-5622 to address the gap.


-- 
To view, visit http://gerrit.cloudera.org:8080/7223
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message