impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henry Robinson (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5388: Only retry RPC on lost connection in send call
Date Mon, 05 Jun 2017 19:23:56 GMT
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5388: Only retry RPC on lost connection in send call
......................................................................


Patch Set 2:

(4 comments)

Were you able to test this at all?

The idea behind retrying in DoRpc() was to insulate callers from having to detect and implement
re-open / retry loops themselves. Because the client-cache keeps connections around for a
long time, they would sometimes get closed by the remote end. That expanded to include the
logic around retrying the recv part of an RPC if the socket read timeout had expired, since
TransmitData() can block for a long time before responding. Arguably it might be better to
set the timeouts for the individual RPC (that's what happens in KRPC). 

Although I think this is brittle, I feel like it's relatively correct _if_ our assumption
that the socket-closed exceptions we detect only get thrown on the write path is accurate.
It seems to be the case for non-secure clusters; for Kerberos or TLS it's less clear (but
it seems that receiving the response only involves read operations). 

Can you see a situation with this patch where an RPC will be spuriously retried?

http://gerrit.cloudera.org:8080/#/c/7063/2/be/src/runtime/client-cache.h
File be/src/runtime/client-cache.h:

PS2, Line 252: else if 
nit: avoid 'else' since previous block will return. Here and below.


PS2, Line 253: <F, Request, Response>
can the compiler not figure out the specialization from the arguments?


PS2, Line 262: Status(TErrorCode::RPC_GENERAL_ERROR, e.what());
> Please note that this will start returning "RPC: SSL resource temporarily u
I think it's ok - it's actually hard to think of many cases where retrying an RPC was a good
idea (cancellation, maybe profile reporting...). I'd rather get error messages than incorrect
behaviour.


PS2, Line 285: // If it's not timeout exception, then the connection is broken, stop retrying.
Remove?


-- 
To view, visit http://gerrit.cloudera.org:8080/7063
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I176975f2aa521d5be8a40de51067b1497923d09b
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Alan Choi <alan@cloudera.com>
Gerrit-Reviewer: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Juan Yu <jyu@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message