impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Hecht (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4862: make resource profile consistent with backend behaviour
Date Thu, 06 Jul 2017 05:02:50 GMT
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4862: make resource profile consistent with backend behaviour

Patch Set 12:

File be/src/exec/

PS12, Line 192: have_async_build_thread_token_
do we still need this? i guess the idea is to acquire the thread resource in Open() as well?
Or is there another reason?
File be/src/exec/blocking-join-node.h:

PS11, Line 111: or

Line 119:   /// SendBuildInputToSink is called to allocate resources for this ExecNode.
once we do true multithreading, does this go away? since we'll be able to open the build child
inside of this Open(), right? Maybe leave a todo about that to clarify why this is needed.
File be/src/exec/exec-node.h:

PS11, Line 87:  o
nit extra space
File common/thrift/Frontend.thrift:

PS12, Line 395:   // Total of minimum memory reservations for a host in bytes. Higher than
              :   // per_host_min_reservation if not all reservations are used concurrently.
I can't figure out what this means from just reading this comment.  Are there more than one
per_host_min_reservation for each host? And why would this be higher if not all reservations
are used concurrently (seems like that condition would mean the needed reservation is lower).

are one of these values related to all the fragment's instances? or are they really only about
the fragment (i.e. what single instance would consume)?

Also, in the comment before, it says "buffer reservation" but here we say "memory reservation"
- are these the same?
File fe/src/main/java/org/apache/impala/planner/

PS12, Line 651: build
I guess here you mean build as a noun rather than a verb. i.e. this is not the memory used
when performing the build. maybe reword to clarify as I had to read the code to understand
the comment.

Either the next comment is sufficient, or you could reword this to explain the case of when
no additional resources are needed.

Alternatively, can't we think of this as the resources of this node (but not its children)
during probing? I.e. call this probePhaseProfile and rename probePhaseProfile to execProfile
below. (See comment in ExecPhaseResourceProfiles about naming).

PS12, Line 654: !BackendConfig.INSTANCE.isPartitionedHashJoinEnabled()
why is that? the old ones don't consume bufferpool reservations.
but also, i think we override this flag for certain build types that aren't supported by the
old join.

PS12, Line 667: .
... because of the async thread. ?
File fe/src/main/java/org/apache/impala/planner/

PS12, Line 85: joinIds
where is that used?
File fe/src/main/java/org/apache/impala/planner/

PS12, Line 108: plan fragment per host
is that for all instances of this fragment running on a host? or something else?

PS12, Line 139: nodes
isn't that always empty? if so, how about just allocating it here rather than taking it as
a param?

PS12, Line 222: join build sinks
but what about the resources of the fragment itself?

PS12, Line 233: peakResources
maybe call that fInstanceResources or instanceResources or perFInstanceResources, etc.
File fe/src/main/java/org/apache/impala/planner/

PS12, Line 637: between Open()
is that inclusive of Open()? if so, the name postOpenProfile seems a bit misleading. Maybe
duringOpenProfile should just be called openProfile and postOpenProfile should be execProfile?

PS12, Line 645: computeResourceProfile

PS12, Line 658: child is open
Maybe say "until after the child's Open() returns."  It ultimately means the same thing, but
it took me a few minutes to follow what this was trying to tell me about the equation below.
File fe/src/main/java/org/apache/impala/planner/

Line 62:   // estimates of zero, even if the contained PlanNodes have estimates of zero.
how is this chosen and why do we need it?

PS12, Line 353: Peak
it seems like we're using peak and sum to mean the same thing in this function. or is there
a subtle distinction?

Line 354:     // Total of per-host minimum reservations across all plan nodes and sinks.
why is that a meaningful value?

PS12, Line 361: now that we've populated
              :       // all the profiles in the execution tree below it.
where did that happen?

Line 391:   }
note to self to look at this function again in the next iteration.
File fe/src/main/java/org/apache/impala/planner/

Line 85:   public ResourceProfile multiply(int factor) {
File fe/src/main/java/org/apache/impala/planner/

Line 105:     // therefore the peak resource consumption is simply the sum of all node resources.
do we have good test coverage of this case?
File fe/src/main/java/org/apache/impala/planner/

PS12, Line 140: Ancestor
what node is this referring to? ancestor of the union? or the union itself?
File fe/src/main/java/org/apache/impala/service/

Line 1024:     queryCtx.setDisable_spilling(disableSpilling);
not this change, but do we have good test coverage of this w.r.t. reservations (that it works
as well as before)?

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message