impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henry Robinson (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4670 / IMPALA-4672: Add RpcMgr and port Statestore services to KRPC
Date Sun, 22 Jan 2017 04:34:38 GMT
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4670 / IMPALA-4672: Add RpcMgr and port Statestore services to KRPC
......................................................................


Patch Set 3:

(50 comments)

http://gerrit.cloudera.org:8080/#/c/5720/2/be/src/exec/kudu-util.h
File be/src/exec/kudu-util.h:

PS2, Line 60: inline Status FromKuduStatus
> Leave a TODO if we're planning on standardizing the Status class.
I don't think that's imminent.


http://gerrit.cloudera.org:8080/#/c/5720/2/be/src/rpc/CMakeLists.txt
File be/src/rpc/CMakeLists.txt:

Line 36:   rpc-trace.cc
> where is this defined?
See cmake_modules/FindKRPC.cmake. Part of the KRPC library inclusion patch.

(https://github.com/henryr/Impala/blob/krpc/cmake_modules/FindKRPC.cmake#L25)


http://gerrit.cloudera.org:8080/#/c/5720/2/be/src/rpc/rpc-builder.h
File be/src/rpc/rpc-builder.h:

Line 40
> use actual type, not auto
I think it encourages readability here. The type of Rpc<MyServiceProxy>::Make() is Rpc<MyServiceProxy>.
It seems repetitive to have that on both sides of the assignment.

The google style guide says:

 (Encouraged) ... long/cluttery type names, particularly when the type is clear from context.


Line 41
> it feels like the service itself should have reasonable defaults, instead o
Timeouts are a per-invocation construct in KRPC. They're set on the per-RPC RpcController
object. 

I agree there should be a default timeout (it's better to occasionally timeout too early than
to never return). I don't think it's necessary to change that per-service. However, there
are cases where the timeout should be different for different methods on the same service.

For example, Statestore::Update() has a relatively long timeout (because it can take a long
time). Statestore::Heartbeat() has a short one because it should execute very quickly. 

I think it makes sense to keep per-RPC timeouts, but have the Rpc class have a reasonable
(60s?) timeout.


Line 84
> how about making this a static function in the proxy class instead? same fo
The proxy class is generated by the KRPC toolchain. It could be changed, but I'd prefer to
share that code with Kudu for now.


PS2, Line 109: 
             : 
             : 
             : 
> Should we add a comment stating what sort of error/error codes don't warran
I think the method comment covers this. Let me know if you think it should be expanded upon.


PS2, Line 131: 
> Check return val?
Done


PS2, Line 157: 
> This function name sounds like it's executing the RPC, akin to ExecuteRPC()
I see your point, but I think the likelihood of confusion is small. Brevity is important for
this method name particularly.


http://gerrit.cloudera.org:8080/#/c/5720/2/be/src/rpc/rpc-mgr.h
File be/src/rpc/rpc-mgr.h:

Line 18: #ifndef IMPALA_RPC_RPC_MGR_H
> since this is part of the runtime system, let's move it under /runtime.
Will postpone for now. There's a dependency on security code which currently lives in rpc/
that hasn't been integrated yet. After that let's take a view if this still belongs in runtime/.


Line 21: #include "kudu/rpc/messenger.h"
> do we really need all of these includes here, or can some of those move int
Moved some to the .inline.


Line 45: ///
> RpcServiceMgr then?
Refer to comment on line 40. This section is only describing the management of services, but
there's more to it than that.


Line 77: /// these pools may be configured by RegisterService().
> what is 'the io system'?
Done


Line 122:   const scoped_refptr<kudu::rpc::ResultTracker> result_tracker() const {
> move to -inline.h
Done


PS2, Line 143: red with all servic
> Do you think this struct is really very necessary? We can do the service re
We would need to retain num_service_threads until we call Init(). But I put Init() into RegisterService()
as well, and now StartServices() is really about starting the acceptor threads. So this structure
can be removed.


Line 145: 
> how widely does this .h file get included? if it's not super-rare, let's tr
Done


Line 158
> why is this needed?
Removed now.


Line 178
> why not inline?
scoped_refptr is a kind of shared_ptr. This reference needs to be shared with all service
objects. I've added a comment.


Line 183
> explain why this needs to be a shared_ptr, ie, who it's shared with.
Done


Line 190
> move to -inline.h
Done


Line 198
> emplace_back instead
Done


http://gerrit.cloudera.org:8080/#/c/5720/2/be/src/rpc/thrift-util.h
File be/src/rpc/thrift-util.h:

Line 143:   return serializer.Serialize(thrift, proto->mutable_thrift_struct());
> protos have a standard function called mutable_thrift_struct()?
See the method comment. Type P must have a member called thrift_struct. Although this is expected
only to apply to ThriftWrapperPB objects, keeping the templatization here means I don't have
to declare that class in this header, keeping protobuf out of the way of other consumers of
this file.


http://gerrit.cloudera.org:8080/#/c/5720/2/be/src/statestore/statestore-subscriber.cc
File be/src/statestore/statestore-subscriber.cc:

Line 36: #include "statestore/statestore.proxy.h"
> are these generated?
yes


Line 76: 
> where does this come from?
rpc::StatestoreSubscriberIf comes from statestore.service.h, which is auto-generated.


Line 79:   StatestoreSubscriberImpl(RpcMgr* rpc_mgr, StatestoreSubscriber* subscriber)
> why not const&?
Done


Line 89:       status = subscriber_->UpdateState(thrift_request.topic_deltas,
> if this is !ok(), wouldn't thrift_response.status be OK?
thrift_response.status gets set on line 100.


Line 91:           &thrift_response.skipped);
> why isn't !__isset an error?
Good point! I made it required in the thrift struct.


PS2, Line 101:       const ThriftWrapperPB* request, ThriftWrapperPB* respons
> Check the return value?
That's a fair question, but what should we do with it? If we can't write the response object,
how can we signal this to caller? (One option is to add an explicit status field to the protobuf,
but I'd rather not confuse matters). 

I think it's ok for now to return the response and have the caller fail to deserialize; the
error will be caught there.


Line 102:     THeartbeatRequest thrift_request;
> what does this mean if there was a deserialization error?
Deserialization errors are application level. This is used to indicate RPC-level failures.

There is facility in RpcContext to return application errors, but that would change quite
a lot of our error handling (e.g. how we assign error codes), and so I consider using that
API out of scope for now.


PS2, Line 114:   StatestoreSubscriber* subscriber_;
> Same here.
Same as above.


Line 179:   if (status.ok()) connected_to_statestore_metric_->set_value(true);
> no auto
See comments elsewhere.


Line 180:   if (response.__isset.registration_id) {
> repeating all these template params feels redundant. what does this look li
You have to pass all the parameters at once, which I think is less readable:

  DoRpc(rpc_mgr_, address, request, &response, 100, 3, 500)

a) Who knows what the numeric constants mean?
b) There's no optionality to the arguments (so no neat way to change defaults across RPCs
without visiting all call sites)
c) The set of parameters will expand over time to include security parameters, deadlines,
feature flags and sidecars. DoRpc() would become really unwieldy.

I eliminated the template parameters (except for the proxy type) by pushing them down to the
Execute() method, where they can be inferred.


Line 210:       return Status("--statestore_subscriber_svc_queue_depth must be a positive
integer");
> can't it deduce the type param?
type param removed. *Impl is the same naming convention as Kudu. I don't think it's so clear
cut that the Impl 'is' the service. 

Fixed the numeric constants.


http://gerrit.cloudera.org:8080/#/c/5720/3/be/src/statestore/statestore-subscriber.cc
File be/src/statestore/statestore-subscriber.cc:

Line 59:     "The number of threads dedicated"
this is clang-format weirdness. I'll fix in the final patch.


http://gerrit.cloudera.org:8080/#/c/5720/2/be/src/statestore/statestore-subscriber.h
File be/src/statestore/statestore-subscriber.h:

Line 266
> what happened here?
No longer [[noreturn]]. Adding the shutdown flag allows this to exit.


Line 73:   StatestoreSubscriber(const std::string& subscriber_id,
> explain param
Done


Line 111: 
> there is no heartbeat service
Done


http://gerrit.cloudera.org:8080/#/c/5720/2/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

Line 132:       const ThriftWrapperPB* request, ThriftWrapperPB* response, RpcContext* context)
{
> why not make this a c'tor param
Ok, changed RegisterService() so it takes a mandatory service implementation, so that now
the responsibility for calling the c'tor is with the caller.


Line 366:     subscribers_.erase(subscriber_it);
> if you care about this, shouldn't it be called before the if?
No, otherwise there's a race:

  - ShouldExit() returns false
  - SetExitFlag() is called
  - Try to offer to threadpool
  - Wrong error message printed out.

This is only to get the error message right when the thread pool rejects an update.


Line 734:   OfferUpdate(make_pair(deadline_ms, subscriber->id()), is_heartbeat ?
> why the logic change there? (and why schedule another update if you just un
There was a deadlock possibility that we never hit (https://issues.cloudera.org/browse/IMPALA-2642)
that became much more likely because the tests now shut down the statestore and make it possible
that OfferUpdate() failed. 

You're right that we should return from line 732.


PS2, Line 772: 
> Should we make these const uints for clarity?
Made them flags.


Line 779: }
> move the body into Start() and remove this function
Start() should not block. Otherwise tests become very awkward to write. Renamed to Join()
for clarity.


http://gerrit.cloudera.org:8080/#/c/5720/2/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

Line 109:   /// Blocks until the exit flag is set. Does not call Start().
> who calls this and does it need to be public?
Yes - anything that starts a statestore and wants to block until it finishes execution.


Line 403:   RpcMgr rpc_mgr_;
> why not inline?
Done


http://gerrit.cloudera.org:8080/#/c/5720/2/be/src/statestore/statestore.proto
File be/src/statestore/statestore.proto:

Line 19: // common/thrift/StatestoreService.thrift for complementary Thrift data structures.
> why a different namespace?
I have a slight preference for this organisationally (keep generated classes separate), but
don't feel strongly so have removed.


Line 21: package impala;
> what for?
Done


Line 22: 
> leave note explaining what .thrift file this is paired up with
Done


Line 23: import "rpc/common.proto";
> these typically end in Msg
KRPC seems to have standardized on *PB, which is more concise. The protobuf style guide is
silent on the subject: https://developers.google.com/protocol-buffers/docs/style


http://gerrit.cloudera.org:8080/#/c/5720/2/be/src/testutil/in-process-servers.cc
File be/src/testutil/in-process-servers.cc:

Line 163
> remove commented out code
Leaving in as a reference for what this class used to do. I have a patch to remove all of
this, however, which will get squashed into this patch.


http://gerrit.cloudera.org:8080/#/c/5720/2/be/src/util/collection-metrics.h
File be/src/util/collection-metrics.h:

Line 234:   int64_t count() const { return boost::accumulators::count(acc_); }
> this isn't truly a getter
how not? I don't mind changing it, but I think count(acc_) acts effectively as a getter on
acc_.


http://gerrit.cloudera.org:8080/#/c/5720/2/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

Line 147
> leave a note explaining where the service definition that uses the structs 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8dbf12b9ecd71d26d239d31c19b487175194c766
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: David Knupp <dknupp@cloudera.com>
Gerrit-Reviewer: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message