impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Hecht (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc
Date Sat, 06 May 2017 21:07:47 GMT
Dan Hecht has posted comments on this change.

Change subject: IMPALA-2550: Switch to per-query exec rpc
......................................................................


Patch Set 10:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

PS10, Line 204: GetPrepareStatus
> the barrier isn't really present, because getfinstancestate already waits f
Still, it's not a pattern I think we should encourage in the code base (DCHECK(x) where x
has side-effects).  How about at least making the non-barrier explicit by shortcircuiting
it like this:

DCHECK(coord_instance_->IsPrepared() && coord_instance_->GetPrepareStatus().ok());


PS10, Line 413:  backend_state->GetStatus();
> good catch, this could be cancelled at this point.
In your current code, will the real status eventually be propagated as the query status? Or
will this race cause the query to result in CANCELLED even though there was a real error?


PS10, Line 424: DCHECK(query_status_.ok()); // nobody should have been able to cancel
> when does the query get displayed on the debug page? you still haven't retu
It gets displayed as soon as there is a client request state registered, I believe. (i.e.
RegisterQuery()). Which is what I meant - webserver cancellation is definitely available before
exec() returns.


PS10, Line 964: #if 0
              : do we really want this?
> remove it unless someone says we want it.
fine with me to delete.


http://gerrit.cloudera.org:8080/#/c/6535/11/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

PS11, Line 200: // fragment instance's executor will not comple
DCHECK(coord_instance_->IsPrepared() && coord_instance_->GetPrepareStatus().ok());

to make it explicit that this has no side-effects.


http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/query-exec-mgr.cc
File be/src/runtime/query-exec-mgr.cc:

PS10, Line 51: Prepare
> i think we should do some initial setup here so we can return an error stat
the renaming helped.


PS10, Line 59: exec-fragment-instance-$0
> i left everything as is, because i didn't want to break existing tests. hap
Which tests? CAn't they be fixed? You renamed the other thread - I think it would be best
to do this one as well. The current name is really confusing because it's not plural.


http://gerrit.cloudera.org:8080/#/c/6535/11/be/src/runtime/query-exec-mgr.cc
File be/src/runtime/query-exec-mgr.cc:

PS11, Line 57: n)
QueryState reference


PS11, Line 59: exec-fragment-instance
start-query-finstances


http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

PS10, Line 195:     // TODO: this might flood the log
              :     LOG(WARNING) << "Couldn't get a client for " << query_ctx().coord_address
              :         <<"\tReason: " << coord_status.GetDetail();
> address what, flooding the log?
Yes, flooding the log. i.e. the TODO would be better if it says what should happen in the
future.

And before it looks like we'd remember the status via UpdateStatus() and then potentially
report it back the next time we try (for periodic reports). Not saying that's the right solution
but it does look like a subtle behavior change so want to think it through. I think the new
behavior is probably not really worse though (for periodic reports, just report and try again
later; for final report, in both old and new we'd drop it on the floor).


PS10, Line 255:     // TODO: should we try to keep rpc_status for the final report? (but the
final
              :     // report, following this Cancel(), may not succeed anyway.)
> if the report rpc fails we fail the query on this node and then do nothing/
Okay. I think the old code used to remember this status (via UpdateStatus()) and then would
report it back if it could talk to the coordinator later when the fragment is complete.

But I agree the protocol needs to be fixed.


Line 319:       prepare_status = instance_status;
> fis.h points out that all public non-getter functions block until the prepa
That fis.h comment was just added.

But I don't see how it's relevant to my initial comment here -- how can prepare_status be
CANCELLED?


http://gerrit.cloudera.org:8080/#/c/6535/11/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

PS11, Line 211: arams.instance_exec_status.emplace_back();
              :     params.__isset.instance_exec_status = true;
but why doesn't the coordinator initiate cancellation if any FIS reports error?  Is this to
simplify the coordinator logic? I guess I'm okay with how you have this currently, but it
just seems clearer to me to keep FIS status in only one place.


PS11, Line 264: 
despite


http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

PS10, Line 59: cancelled
> vague in what way?
Vague because it's not clear what all the side effects of "cancel" are, and when they take
place. i.e. does it have an impact on resources, and if so, when? 

Really, just thinking ahead a bit, so let's leave this alone for now. It will probably get
clarified with the release resources changes.


PS10, Line 112: Blocks until all fragment instances have finished
              :   /// their Prepare phase.
> it's the externally observable behavior, and qem::startqueryhelper takes ad
Okay.


PS10, Line 132: query is complete
> i agree the guarantees around resource release and lifetime need to be comp
Agree we'll revisit this in a later change.


http://gerrit.cloudera.org:8080/#/c/6535/11/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

PS11, Line 250: 
that's no longer defined; delete.


PS11, Line 260: 
This is subtle as to why we have it so I think it would be helpful to explain in what cases
you shouldn't try to cancel - when FIS haven't started yet. Or even better would be to rename
this to 'fis_started' since that's what the caller really needs to care about.


http://gerrit.cloudera.org:8080/#/c/6535/11/be/src/runtime/test-env.cc
File be/src/runtime/test-env.cc:

Line 83:   exec_env_.reset();
what's this about?


-- 
To view, visit http://gerrit.cloudera.org:8080/6535
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message