impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcel Kornacker (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4856: Port ImpalaInternalService to KRPC
Date Tue, 18 Apr 2017 04:44:17 GMT
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4856: Port ImpalaInternalService to KRPC
......................................................................


Patch Set 3:

(17 comments)

first look

http://gerrit.cloudera.org:8080/#/c/5888/3/be/src/runtime/data-stream-mgr.h
File be/src/runtime/data-stream-mgr.h:

Line 60: /// first batch. Since the sender may start sending before the receiver is ready,
the data
we could relatively easily remove that restriction, the execrpc changes are going to reduce
the startup time.


Line 77: /// fixed-size buffer for later processing, or discard the batch if the buffer is
full. In
how can the is-full case come up? shouldn't flow control keep the sender from getting to that
point?


Line 84: /// queue. An error code is returned which causes the sender to retry the previous
batch.
sounds complicated. is that really necessary?


Line 113: /// time-out and cancel itself. However, it is usual that the coordinator will initiate
i guess that extra complication is necessary because the non-existence of an in-flight query
id doesn't mean it won't show up later.


Line 124: /// The sender has a default timeout of 2 minutes for TransmitData() calls. If the
timeout
why so long?

can we not distinguish "the rpc failed because the receiver went away" and "the data stream
is paused because the client hasn't called fetch in a while"?


Line 131: /// immediately. Both of these cases are designed so that the sender can prepare
a new
in the queued case, it seems like the response should go out when the queue length drops below
a limit, no?


Line 137: /// notification should not exceed 60s and so the sender will not time out.
that also sounds complicated.


Line 228:   /// Adds a row batch to the recvr identified by fragment_instance_id/dest_node_id
if the
there is no dest_node_id, sender_id.


Line 245:   void AddData(const TUniqueId& fragment_instance_id, TransmitDataCtx&&
payload);
don't use an rvalue param here. rather, make ownership (and change of ownership) explicit.


http://gerrit.cloudera.org:8080/#/c/5888/3/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:

Line 546:   /// Serialize to the row_tuples field of a RowBatchPb.
something called ToProto should materialize a message that corresponds to this class. that's
not the case here. i don't see why RowBatch::ToProto wouldn't do this itself.


http://gerrit.cloudera.org:8080/#/c/5888/3/be/src/service/impala_internal_service.proto
File be/src/service/impala_internal_service.proto:

Line 1: // Licensed to the Apache Software Foundation (ASF) under one
let's move all .proto files into common/proto, that'll make it a lot easier to find them.


Line 32:   // Tuples for all rows
incomprehensible comment.

since protos don't allow typedefs, we should probably define messages for all the ids, that
makes it clear in the code what it's meant to represent.


Line 46:   // Of type CatalogObjects.THdfsCompression (TODO(KRPC): native enum)
why not use the enum?


Line 50: message TransmitDataRequestPb {
you left out the service version (example: ImpalaInternalServiceVersion in ImpalaInternalService.thrift).

for all params proto:
- must include the service version
- all fields are optional
- the comment needs to indicate whether it's optional or required in v1

for all result protos: the latter two points apply as well


Line 115: service ExecControlService {
why not put the services in separate .proto files


Line 131: service DataStreamService {
would it make sense to have separate patches for the two services? it feels like the data
stream service is pretty much separate from the control service, and in particular it doesn't
conflict with the ongoing coordinator/"control" changes.


Line 140:   // Called by the coordinator to deliver global runtime filters to fragment
i consider the filters to be control structures. why wouldn't they be in execcontrolservice?


-- 
To view, visit http://gerrit.cloudera.org:8080/5888
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I95229290566a8ccffd80ed2d74c1c57cf1479238
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Anonymous Coward #168
Gerrit-Reviewer: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message