impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henry Robinson (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc
Date Mon, 17 Apr 2017 23:41:11 GMT
Henry Robinson has posted comments on this change.

Change subject: IMPALA-2550: Switch to per-query exec rpc
......................................................................


Patch Set 6:

(14 comments)

I quickly looked at some familiar files.

http://gerrit.cloudera.org:8080/#/c/6535/6/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

PS6, Line 46: using boost::lock_guard;
            : using boost::mutex;
#include "common/names.h" for these.


PS6, Line 348: true
the class comment says that Cancel() returns true if the cancellation happened - but it almost
certainly didn't here.


http://gerrit.cloudera.org:8080/#/c/6535/2/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

PS2, Line 252: reate BackendStates
             :   bool has_coord_fragment = schedule_.GetCoordFragment() != nullptr;
             :   const TNetworkAddress& coord_address = ExecE
doesn't need to be two statements.


PS2, Line 262: kend_state->Init(entry.second, fra
const &?


PS2, Line 264: 
This would be better named coord_backend_idx_.


PS2, Line 1000:   // debugging aid for back
Acquiring this lock is problematic for an RPC handler. See <TODO>.


http://gerrit.cloudera.org:8080/#/c/6535/6/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

PS6, Line 408: // TODO: rename this metric
Why not do this now? (But note that for the moment that name isn't published in the profile
anywhere).


PS6, Line 415: || status.IsCancelled()
When would we have a backend that was cancelled at this point (since cancellation always originates
from the coordinator)? And if there was a cancelled backend, why overwrite its status with
the error state? From a user perspective, if they cancel the query, surfacing an error would
be confusing.


PS6, Line 858: Report aggregate
             :   // query profiles at this point.
This comment seems out-of-sync with the rename of ReportQuerySummary().


PS6, Line 885: TODO: do we need this error propagation path, in addition to the status report
             :   // we'll get anyway?
This error is needed so that client-facing operations can fail-fast. The next coordinator
status report might be seconds away.


PS6, Line 986: if (!(returned_all_results_ && status.IsCancelled()) && !status.ok())
{
Will you include the fix for IMPALA-4925 here as discussed? (see https://gerrit.cloudera.org/#/c/5987/1/be/src/runtime/coordinator.cc)


PS6, Line 993: lock_guard<mutex> l(lock_);
We spoke offline about removing this lock acquisition in this patch (see https://gerrit.cloudera.org/#/c/5971/2
for my patch a month or so ago).  This is too expensive to take in an RPC handler - when a
query completes this can lead to significant convoying on this lock, and with KRPC the reactor
threads can get serialized (this would still be an issue even if the reports were batched).
Do you want to tackle this in this patch or in a follow-on?


http://gerrit.cloudera.org:8080/#/c/6535/6/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

PS6, Line 199: // TODO: what to do with this? should we treat a failure to get a backend
             :     // connection as a failure of instance execution?
Yes. 

The report RPC from a backend to the coordinator is how the backend tells if it's orphaned
from the coordinator. That should cause the backend to fail, because it can't receive cancellation
messages, and might otherwise block forever. 

We CancelInternal() later if the RPC itself fails. This is just another way that the RPC can
logically fail. KRPC abstracts this away.


http://gerrit.cloudera.org:8080/#/c/6535/6/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

PS6, Line 65:  - guarantee delivery of the ReportExecStatus rpc in case of an error, otherwise
            : ///   the query might hang
We can't guarantee delivery - not quite sure what you mean here. 
If the RPC can't be sent, the query state should get torn down on the assumption that the
coordinator has failed.


-- 
To view, visit http://gerrit.cloudera.org:8080/6535
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message