impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henry Robinson (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-2905: Handle coordinator fragment lifecycle like all others
Date Wed, 12 Oct 2016 00:44:18 GMT
Henry Robinson has posted comments on this change.

Change subject: IMPALA-2905: Handle coordinator fragment lifecycle like all others
......................................................................


Patch Set 11:

(42 comments)

http://gerrit.cloudera.org:8080/#/c/4402/10/be/src/exec/plan-root-sink.h
File be/src/exec/plan-root-sink.h:

Line 32: /// Sink which manages the handoff between a 'sender' (a fragment instance) that
produces
> differentiate fragment/instance
Done


Line 50: /// The sink is thread safe up to a single producer and single consumer.
> i'm still nervous about the ping-pong behavior that this implementation req
Done - filed IMPALA-4268 because there's quite a lot of nuance to doing that efficiently that
I don't want to expand upon here.


http://gerrit.cloudera.org:8080/#/c/4402/10/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

PS10, Line 43: #include "exec/plan-root-sink.h"
             : #include "exec/scan-node.h"
             : #include "gen-cpp/Frontend_types.h"
             : #include "gen-cpp/ImpalaInternalService.h"
             : #include "gen-cpp/ImpalaInternalService_constants.h"
             : #include "gen-cpp/ImpalaInternalService_types.h"
             : #include "gen-cpp/Partitions_types.h"
             : #include "gen-cpp/PlanNodes_types.h"
             : #include "runtime/backend-client.h"
             : #include "runtime/client-cache.h"
> what's with the reordering?
clang-format, putting includes in alphabetical order. Not quite sure why there was a line-break
between two groups, but I've removed that.


Line 484:       && query_ctx_.request.query_options.mem_limit > 0) {
> is there any reason to do this here, instead of in start..()? this makes it
Done - moved to both Start...() methods.


Line 488:       schedule_.request_pool(), exec_env_->process_mem_tracker());
> why does this still need to be done here? i wouldn't expect to see any more
We still need a mem tracker for runtime filters and result caching, I fixed up the comments.
A follow on task I have is to move result caching into the coordinator, so the QES doesn't
need to access this memtracker.


Line 524:     RETURN_IF_ERROR(prepare_status);
> is there any reason why you need both root_finstance_ and executor_? execut
Done


Line 611:     for (int fragment_idx = 0; fragment_idx < request.fragments.size(); ++fragment_idx)
{
> remove references to remote fragments, they're all remote now, unless you m
Done


Line 1129:   if (has_called_wait_) return Status::OK();
> you already called this in Exec(). no harm, i guess, but i'm also scratchin
I had once run into a case where Wait() was called even though Prepare() failed (which would
hang WaitForOpenPlan()), but I think that must have been a bug. I have a DCHECK in WaitForOpenPlan()
now to ensure that it is only called after Prepare().


Line 1164:       << "checked?";
> i find this formatting counterproductive, everything is bunched up on the r
This is automatically formatted by clang-format. It's very hard to make it stay on two lines,
because it won't allow:

  DCHECK(...) << "msg"
    << "msg2"

replacing it with:

  DCHECK(...) << "msg"
              << "msg2"

but you can at least reduce the line count of the message by putting it all on the next line:
  
  DCHECK(...) 
    << "msg" << "msg2" .....
    << "msg3" // and so on

That's the best I could come up with, and I think it's readable, so I've changed it to that.

There are going to be a few kinks with clang-format as we integrate it more. I'll keep a list
to see if there's anything we can do about the ones that reviews flag.


Line 1179:     // Don't return final NULL until all instances have completed.  GetNext must
wait for
> have you tried removing that call? something about lifecycle problems?
I looked into this in some detail. This should be moved, but doing so is non-trivial and so
out of scope for this patch. See IMPALA-4275 for the gory details.


Line 1232:     if (!request.fragments[i].__isset.plan) continue;
> why is this necessary? intuitively i'd assume that initialization doesn't n
I don't think it's required. Although InitExecProfile() won't be called concurrently with
UpdateExecStatus(), a barrier is theoretically needed to ensure that the writes from this
thread are visible to the reporting thread. But there are other barriers in this thread that
suffice.


Line 1284:           Substitute("Averaged Fragment $0", request.fragments[i].display_name),
true));
> i think we should get rid of the specialization of the coord instance, incl
The only difference now is that there's no average (because it would, of course, tell us nothing
new), and the title is different. We probably need to keep the title because of tooling dependencies.
Do you think we should have an average for fragments with only one instance?


Line 1925:   TPlanFragmentInstanceCtx fragment_instance_ctx;
> resolve?
This probably should have been removed already - I think it was duping error logs (since there's
a FragmentInstanceState for the root fragment instance).


Line 2131:     FilterState* state = &it->second;
> would be better not to have to reach into the executor here. why not also i
I'm not sure 100% sure what you're suggesting here. 

I think it's best to interact with the root sink directly - I don't want to add PFExecutor::TearDown()
because that more tightly couples the lifecycle of the PFE with the coordinator. I've cached
the root_sink_ pointer so that only that needs to be accessed. Eventually I think we can get
rid of executor_ completely, but that requires removing the runtime state dependency here
and in QES.


http://gerrit.cloudera.org:8080/#/c/4402/10/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

PS10, Line 37: #include "common/global-types.h"
             : #include "common/hdfs.h"
             : #include "common/status.h"
             : #include "gen-cpp/Frontend_types.h"
             : #include "gen-cpp/Types_types.h"
             : #include "runtime/runtime-state.h"
             : #include "scheduling/simple-scheduler.h"
             : #include "service/fragment-exec-state.h"
             : #include "service/fragment-mgr.h"
             : #include "util/histogram-metric.h"
             : #include "util/progress-updater.h"
             : #include "util/runtime-profile.h"
> what's this about?
clang-format will put these in alphabetical order. I like it - we used to keep gen-cpp includes
separate but I don't see the point any more.


Line 76: /// Query coordinator: handles execution of fragment instances on remote nodes, given
a
> update. also describe result materialization logic somewhere.
Done


Line 111:       RuntimeProfile::EventSequence* events);
> that doesn't seem quite right, or at least not very precise, Exec() calls W
Done - I don't want to mention that the implementation waits for Prepare() here as that's
specific to this implementation, and something I want to remove in a follow-on patch (we should
move that to Wait(), but it complicates things too much for this patch).


Line 124:   Status Wait();
> num_results -> num_rows?
max_rows - even more clear about the semantics.


Line 280: 
> owner? who creates this?
Done


Line 395:   /// Throughput counters for the coordinator fragment
> set where?
After further consideration, this isn't necessary. Once WaitForPrepare() returns successfully,
there's no need to keep the exec state around because the fragment instance will last until
the consumer closes the sink.


http://gerrit.cloudera.org:8080/#/c/4402/10/be/src/runtime/plan-fragment-executor.cc
File be/src/runtime/plan-fragment-executor.cc:

Line 407
> we're purposely logging here before returning.
Done - not sure how useful this is, but I don't want to break it now.


Line 214:       DataSink::CreateDataSink(obj_pool(), request.fragment_ctx.fragment.output_sink,
> use TDataSink.type instead
Done


Line 274:   profile()->AddInfoString(HdfsScanNode::HDFS_SPLIT_STATS_DESC, str.str());
> long line
Done


Line 292:       runtime_state_->desc_tbl().PrepareAndOpenPartitionExprs(runtime_state_.get()));
> we already have too many counters that people don't understand, does this r
I think it's useful to have a coarse breakdown of where the time is spent for each fragment
instance.


Line 306:   }
> differentiate fragment vs. instance
Done


Line 328:   } else if (!status.IsCancelled() && !status.IsMemLimitExceeded()) {
> done_ apparently doesn't simply mean opened.
Done


http://gerrit.cloudera.org:8080/#/c/4402/10/be/src/runtime/plan-fragment-executor.h
File be/src/runtime/plan-fragment-executor.h:

Line 55: ///     if (Prepare().ok()) {
> PreparePlan/OpenPlan/ExecPlan or Prepare/Open/Exec?
Done


Line 112:   /// exec node tree. report_status_cb will have been called for the final time
when
> plan tree -> fragment instance
The fragment instance, to me, describes the whole thing - sink, exec node tree and executor
to manage it all. How about exec node tree to describe the thing that produces rows?


Line 146:   /// Prepare() returns; if Prepare() fails may be nullptr.
> when is it valid to call this?
Done


Line 154:   ExecNode* exec_tree_; // lives in runtime_state_->obj_pool()
> consider renaming to exec_tree_ or something like that to start the process
Done


Line 172:   bool closed_;
> let's pick a different, more meaningful name, we use 'done' all over the pl
Looking at where this is used, I don't think it warrants being a member variable. It's only
set in GetNext(), and read only for a DCHECK() in ExecInternal() which I don't think is particularly
useful. I've made it a local variable.


Line 197:   /// object.
> why do we need this?
To keep the timings visually and logically separate - it makes it easy to eyeball the profile
to see what part of the instance execution took a long time.


Line 288:   /// OK if the input was exhausted and sent to the sink successfully, an error
otherwise.
> todo: consolidate across all instances of the same fragment
I think that's in progress elsewhere, so I'll remove this todo and leave the other patch to
sort it out.


http://gerrit.cloudera.org:8080/#/c/4402/10/be/src/service/impala-internal-service.h
File be/src/service/impala-internal-service.h:

Line 35:     DCHECK(impala_server_ != nullptr);
> why not also move ImpalaServer into the exec env?
Already was, done.


http://gerrit.cloudera.org:8080/#/c/4402/10/be/src/service/impala-server.h
File be/src/service/impala-server.h:

Line 75: /// This class is partially thread-safe. To ensure freedom from deadlock,
> please move this into a separate file altogether, there's no need for it to
Done


Line 81: /// query execution states after a timeout period.
> not your fault, i know, but please tighten up the comment, it sounds a bit 
Done


Line 87:  public:
> this is a weird class, these functions are really specific to the subclasse
It depends if you think the QRS should evaluate the output expressions. If not, then the caller
is going to want to call row-by-row because that's how it produces results.


http://gerrit.cloudera.org:8080/#/c/4402/10/be/src/service/query-exec-state.cc
File be/src/service/query-exec-state.cc:

Line 936
> preserve dchecks?
!eos_ is pretty self-evident from the code - FetchRowsInternal() has if (eos_) return...

Similarly with coord - there's a check if it's set in FetchRowsInternal().


Line 939
> at least preserve these comments.
Done


Line 945
> what about this?
Done


Line 748:     lock_.lock();
> i find the locking protocol in combination with the several layers of indir
Done


http://gerrit.cloudera.org:8080/#/c/4402/10/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

Line 25:  * Sink for the root of a query plan that produces result rows. Allows coordination
> not specific enough: this also applies to non-root fragment instances that 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb0064ec2f085fa3a5598ea80894fb489a01e4df
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message