impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henry Robinson (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3977: TransmitData() should not block
Date Thu, 15 Dec 2016 22:35:05 GMT
Henry Robinson has posted comments on this change.

Change subject: IMPALA-3977: TransmitData() should not block
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/5491/1/be/src/runtime/data-stream-mgr.cc
File be/src/runtime/data-stream-mgr.cc:

PS1, Line 53: STREAM_EXPIRATION_TIME_MS
> I've thought about this for a while and it makes sense that the cache is no
I think we can get the same effect as the stream cache without the complexity. The key is
distinguishing between all the different states: 

If a receiver is not present, return EAGAIN. Either a) this is the first batch, so try again
or b) we've already successfully sent > 0 batches, so the sender knows the receiver has
gone away. The case you describe, where there's an upstream limit, could lead to a sender
retrying if it never contributed a batch to the limit. But that's not very common. If we have
concerns about the network usage of retrying, we can always fall back to probing without a
payload.

If the receiver is present, but the queue is full, return a different status code to distinguish
from the case where the receiver is not present. This allows us to fail fast in the former
case where the receiver has been torn down but we've already sent a batch, so we know it was
once there.


http://gerrit.cloudera.org:8080/#/c/5491/1/be/src/runtime/data-stream-sender.cc
File be/src/runtime/data-stream-sender.cc:

PS1, Line 209: TTransmitDataResult res;
> TTransmitDataResult has only one field, i.e. Status.
Don't rely on TTransmitDataResult not changing.

There's no reason L224+ have to be outside the loop.


PS1, Line 217: DoTransmitDataRpc
> I like the idea, but isn't that optimizing for a worse-case scenario? Keepi
Yes, I think the sender needs to time-out of AddBatch(). Otherwise it could get blocked in
the queue and hold a service pool thread.

I'm ok with waiting to see if the probe request has a benefit, but I still think we'll need
to distinguish the first RPC from all subsequent ones to distinguish between "receiver is
not there yet" from "receiver was here and has disappeared" conditions (see other comments
for details).

I'd expect the probe to take a few ms at most, and the network capacity for those should be
effectively infinite given our requirements. The capacity for full batch sends might be smaller,
which affects the latency - I'm not sure we can say that throughput is not an issue. Anyhow,
you can defer that change for now.


-- 
To view, visit http://gerrit.cloudera.org:8080/5491
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I47f9d6c1de95a72ab73471dcd2a3118ef13e7db0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-Reviewer: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message