impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Hecht (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc
Date Sat, 29 Apr 2017 00:25:30 GMT
Dan Hecht has posted comments on this change.

Change subject: IMPALA-2550: Switch to per-query exec rpc

Patch Set 10:

File be/src/runtime/

PS10, Line 327:  // if we get an error while trying to get a connection to the backend,
              :   // keep going
what is this talking about?

PS10, Line 352: keep on cancelling the other fragments
seems misplaced now

PS10, Line 354: eturn true;
should this return false? we didn't actually succeed at sending the rpc, right?
File be/src/runtime/coordinator-backend-state.h:

PS10, Line 61: status
File be/src/runtime/

PS10, Line 178: close
i'm not sure what this is talking about. is it referring to CloseConsumer() or Close()?

PS10, Line 179: it should
              :   // cancel itself when it receives an error status after reporting its profile.
I also don't see what this is trying to explain.
oh, after reading FIS::Cancel() I guess this is referring to that code in there that will
do CloseConsumer(). This is pretty confusing since I wouldn't assume that cancel on the FIS
side is going to CloseConsumer.

PS10, Line 183:  Grab executor 
what is this referring to?

PS10, Line 184: order to get a reference to coord_instance_
              :   // so that coord_sink_ remains valid throughout query lifetime.
how does getting a reference to coord_instance_ affect the validity of coord_sink_? The FIS
isn't ref counted, the QES is. But we already have the QES reference, so i'm not sure what
this is talking about.

PS10, Line 189: at this point, the query is done with the Prepare phase,
why do we know that? Oh, because GetFinstanceState() blocks until Prepare? That's pretty subtle.

PS10, Line 204: GetPrepareStatus
this is a dangerous side-effect to have inside of a DCHECK -- it creates a barrier only present
in debug builds. while this isn't bug, it means that release builds have a pretty big difference
in behavior.

Maybe we should rename GetPrepareStatus() to WaitForPrepare() or similar to highlight that
side-effect and prevent this from happening.

Line 263:   }
DCHECK(!has_coord_fragment || coord_backend_idx_ != -1) ?

PS10, Line 413:  backend_state->GetStatus();
what is this trying to do? Oh, I guess get the status of BackendState::Exec()? 

But this might also pick up the execution status of the FIS if it raced ahead fast enough,

also is it really true this won't be CANCELLED?  Couldn't we have FIS_0 hit an error during
QIS::Prepare(), which sets its preapred_status_, which QES notices and calls QES::CancelInternal(),
which calls FIS_1::Cancel(), which cases FIS_1 to send a report with CANCELLED as the status,
which could arrive before the report for FIS_0 which contains the real error?

PS10, Line 424: DCHECK(query_status_.ok()); // nobody should have been able to cancel
why? because the query handle hasn't yet been returned?  But what about via the debug webpage?

PS10, Line 964: #if 0
              : do we really want this?
what are you going to do with this?
File be/src/runtime/coordinator.h:

PS10, Line 210: query 
query execution? (as opposed to the query being closed)

PS10, Line 299: /// True if execution has completed, false otherwise.
              :   bool execution_completed_ = false;
doesn't appear to be used
File be/src/runtime/

PS10, Line 547: ReleaseResources
why is this called ReleaseResources and not Close? What are we saying the difference between
those two terms are?
File be/src/runtime/

Like QES, I think the lifecycle state machine could be more clearly defined for FIS, but that
can be refined later.

PS10, Line 95:   if (!status.ok() && !status.IsCancelled() && !status.IsMemLimitExceeded())
             :     // Log error message in addition to returning in Status. Queries that do
not fetch
             :     // results (e.g. insert) may not receive the message directly and can only
             :     // the log.
             :     runtime_state_->LogError(status.msg());
             :   }
you should delete this when rebasing e0d1db5

PS10, Line 104:   // TODO: deal with final report failure, otherwise the coordinator doesn't
              :   // we're done
is that a regression or something that didn't work before as well?

PS10, Line 213:     // TODO: why is this here? FragmentComplete() already releases the thread
              :     //ReleaseThreadToken();
from the comment above, it sounds like it's meant to intentionally release the token early.
do you plan to delete this code or re-enable it?
do we know longer see the underutilization?

PS10, Line 315:   // TODO: do not delete mem trackers in Close()/ReleaseResources(), they
are part of
              :   // the control structures we need to preserve until the underlying QueryState
              :   // disappears.
is this todo talking about the code here or the code that deletes the mem trackers? don't
we still want to free the resources attached to the row_batch_ here, even if the mem-tracker
control structures are kept around?

PS10, Line 320: ReleaseResources
not for this change, but again we seem to be mixing terminology close and release-resources,
and need to define the various states more clearly.

PS10, Line 331: DCHECK_LE(other_time, total_time);
IMPALA-4631 is going to hit again. what do you plan to do about that? i don't think it's worth
spending more time tracking that down right now.
File be/src/runtime/fragment-instance-state.h:

PS10, Line 52: /// Tear-down happens automatically at the end of Exec() and frees all memory
             : /// for this fragment instance and closes all data streams.
Is this referring to Close()?  So in FIS case, Close() releases resources? We need to sort
out the lifecycle terminology.

PS10, Line 77: The remaining non-getter public functions block until the Prepare phase finishes.
this comment is kinda buried. maybe better to note that at the class level, or in each func
comment like in QES.h.

PS10, Line 84: GetPrepareStatus
like QES, the blocking side-effect isn't obvious from the function name. I think calling code
would be clearer if called WaitForPrepare() or similar.

PS10, Line 91: GetOpenStatus

Line 121:   RuntimeState* runtime_state_;  // lives in obj_pool()
it'd be nice to point out the lifetimes. i.e. valid after Prepare, if successful. is that
true for all members, or are there exceptions?

PS10, Line 149:  
by Prepare()

PS10, Line 153: should live in obj_pool()
doesn't the second clause invalidate this. i.e. this shouldn't live in obj_pool because we
don't want it to have the same lifetime as the FIS. i.e. it's not purely a control structure,
it's a resource.

PS10, Line 159: .
... or to the Prepare status if Prepare() failed.

PS10, Line 224: Blocks until Prepare() finishes.
what does that mean? if Prepare() is run synchronously w.r.t. the caller of this so how could
this block?

PS10, Line 227:  Can handle
              :   /// partially-finished Prepare(). Blocks until Prepare() finishes.
that seems to contradict. i.e. if it blocks, then Prepare() must be finished. Maybe it means
partially-succeeded Prepare?
File be/src/runtime/

Line 111:   //VLOG_QUERY << "AddChildTracker(): parent=" << this << " child="
<< tracker << " " << tracker->label_ << "\n" << GetStackTrace();
more of these to remove
File be/src/runtime/

PS10, Line 51: Prepare
why doesn't this happen on the other thread? won't codegen happen here eventually? 

I guess we want the prepare status to be returned synchronously from this RPC, so okay to
leave this but wanted to check the plan.

Line 57:   // handler thread (= the caller of this function)
let's mention that ownership of the qs reference count is passed to the StartQueryHelper thread.

PS10, Line 59: exec-fragment-instance-$0
I don't understand the choice of name for this thread and the ones started in StartFInstances()
(which is exec-query-finstances-$0).

It seems the thread created here should be plural since it starts multiple instances, and
the ones started by StartFInstances() should be singular since they each execute a single

How about "start-query-finstances-$0" for this one, and "exec-query-finstance-$0" for the
File be/src/runtime/query-exec-mgr.h:

Line 74:   /// 'created' is set to true if it was created, false otherwise.
it'd be nice to document that this increments the refcount.
File be/src/runtime/

PS10, Line 195:     // TODO: this might flood the log
              :     LOG(WARNING) << "Couldn't get a client for " << query_ctx().coord_address
              :         <<"\tReason: " << coord_status.GetDetail();
how do we plan to address that? what used to happen in this case?

PS10, Line 208: status.SetTStatus(&params);
why does 'status' get set into the params.status if it's specific to the fis? the comment
in the thrift says this is only for the "overall status".

PS10, Line 255:     // TODO: should we try to keep rpc_status for the final report? (but the
              :     // report, following this Cancel(), may not succeed anyway.)
what used to happen in this case? did we used to send the rpc_status back in some cases? are
we in danger of ending the query early but not returning failure? what failure is returned
as the ultimate query status in these cases where getting a client back to the coordinator

PS10, Line 306: exec-query-finstances-$0
as mentioned in query-exec-mgr, seems this should be not plural: "exec-query-finstance-$0"

PS10, Line 311: don't return until every instance is prepared
why is that important? we're already running on an async thread, so why should it matter if
we return? i think maybe this is really talking about implementing the QES::prepared_promise_
barrier, in which case it would be good to just say that.

Line 319:       prepare_status = instance_status;
It might simplify reasoning about lifetimes if we set some rules on when instances will start
checking for cancellation. i.e. make it a rule that they don't check until after their Prepare().
 (And thus couldn't possible return CANCELLED from GetPreparedStatus()).

PS10, Line 323: returned status
what is this referring to?

PS10, Line 342: CancelInternal();
I guess this can happen before all instances have gotten through their Prepare. Is it necessary
to not just call Cancel() (which would block), and get rid of CancelInternal()?  That would
reduce the combinations of possible concurrent phases and possible state transitions of the
QES phases. It would even be nice to define that state machine of QES phases, which then simplifies
reasoning about and documenting the lifetimes of QES fields (control-structures and resources).

Line 343:   ExecEnv::GetInstance()->query_exec_mgr()->ReleaseQueryState(this);
would be nice to comment that this reference was taken in StartFInstances().

PS10, Line 353: if (!is_cancelled_.CompareAndSwap(0, 1)) return;
cancellation is a bit more difficult to reason about now that CancelInternal() can return
before the cancellation side-effects (at least those that are on the synchronous path) actually
occur (when another thread won the race to setting is_cancelled_).  

FInstanceState::Cancel() also needs to be thread-safe, right? so why not just always call
down and let the lower level synchronization take care of it?  having this flag means we have
two level of asynchronisity on the cancellation path.
File be/src/runtime/query-state.h:

I think defining a state machine for the lifecycle of this (and FIS) would go a long way,
and then each field and operation can be documented against those states. Don't have to do
that all now, but i have noted some places where maybe we could tighten up the transition
into the cancelled state a bit.

PS10, Line 54: all query state (contained
             : /// either in this class or accessible through this class, such as the
             : /// FragmentInstanceStates) is guaranteed to be alive.
That's not the long term plan though, right? i.e. resources may be freed even when there is
a reference (say, after all instances have completed). If so, how about a TODO to clarify
the direction.

PS10, Line 59: cancelled
I think what it means to be "cancelled" is a bit vague, given the asynchronous nature of cancellation.
maybe we can spell out the actual lifetime guarantee more explicitly.

PS10, Line 106: Uses few cycles
where will we eventually put codegen?
Also, I think it's a bit confusing that the QES Prepare phase and QIS Prepare phase are unrelated.

PS10, Line 112: Blocks until all fragment instances have finished
              :   /// their Prepare phase.
why is that important given that this runs on an async thread itself?

PS10, Line 119: GetPrepareStatus
This name is confusing given that it doesn't get the status for QES Prepare(). Also, as mentioned
elsewhere, I think it would be good to highlight the barrier semantics by using the term "Block"
or "Wait" in the name

PS10, Line 132: query is complete
what does that mean? when all fragment instances are complete, or when all references are
released, or something else?  We don't have to get bogged down on this change, but I think
we should refine this soon to avoid confusion and set a clear direction.

Line 146:   /// including its profile.
Add: Not legal to call until after Prepare is complete.

PS10, Line 159: const TQueryCtx query_ctx_;
why is this split out from rpc_params_?  oh, I guess for the coordinator, hmm.

let's document that this is set in the ctor, i.e. always valid.

Line 174:   DescriptorTbl* desc_tbl_ = nullptr;
For all these fields that are set outside the ctor, how does an outsider know whether the
fields are valid yet or not? I guess the assumption is that they are accessed only by FIS
threads? Or are they guaranteed to be valid only after prepared_promise_ is set? Would be
nice to spell out the lifetimes for an asynchronous access (even if that's to say that they
can only be accessed by FIS threads).

Line 177:   MemTracker* query_mem_tracker_;
would be nice to group the fields based on their lifetime, i.e. move this adjacent to query_ctx_
(these two field are always valid since set in ctor).

PS10, Line 184: prepared_promise_
when reading the code it's hard to remember that this is a barrier for all FIS Prepare, not
this object's Prepare(), given the field name. How about calling it instances_prepared_promise_
or similar?

PS10, Line 194: )
Created in Prepare().

Line 199:   /// Temporary files for this query (owned by obj_pool_)
created in Prepare().

Line 203: 
For all these fields that are created after the ctor, how does an outsider know whether the
fields are valid yet or not? I guess the assumption is that they are accessed only by FIS

PS10, Line 222: latter

PS10, Line 228: Cancels
Initiates cancellation ...
since the is_cancelled_ bit can cause this thread to race ahead and return before the thread
that set is_cancelled_ actually does anything.

Also, is it legal to call this before prepare_promise_ is set, or not? I guess it is and that's
why it's split out from Cancel()?  Is that really required (more comments about that in the
cc file)?
File be/src/runtime/

PS10, Line 335: DCHECK(query_state_ != nullptr);
why? (was this meant to go in query_mem_tracker()?)
File be/src/runtime/runtime-state.h:

PS10, Line 99: Return the query's ObjectPool
i guess that means only control structures, not resources, should be put in there.  Is that
currently true?
File be/src/service/

PS10, Line 44: //
are you removing these or putting them back?
File common/thrift/ImpalaInternalService.thrift:

PS10, Line 339: TExecQueryFragmentsParams

PS10, Line 398: s
delete 's'

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Henry Robinson <>
Gerrit-Reviewer: Marcel Kornacker <>
Gerrit-Reviewer: Michael Ho <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message