impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Ho (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5558/IMPALA-5576: Reopen stale client connection
Date Sun, 25 Jun 2017 18:02:52 GMT
Michael Ho has posted comments on this change.

Change subject: IMPALA-5558/IMPALA-5576: Reopen stale client connection
......................................................................


Patch Set 8:

(3 comments)

Answering some of the questions first. Will update PS to address other comments.

http://gerrit.cloudera.org:8080/#/c/7284/8/be/src/rpc/thrift-util.cc
File be/src/rpc/thrift-util.cc:

PS8, Line 198: TTransportException::END_OF_FILE &&
             :              strstr(e.what(), "No more data to read.
> how do we know that this means the connection was reset? is it because thri
It comes from the following line in the thrift library:

http://github.mtv.cloudera.com/CDH/Impala/blob/cdh5-trunk/thirdparty/thrift-0.9.0/lib/cpp/src/thrift/transport/TTransport.h#L41

It can happen only if read() returns 0. As you can see in the code posted above, it's using
uint32_t for the variable 'get' so it can never be < 0. So, that means it's triggered if
read() returns 0. Looking at TSocket.cpp, it corresponds to the following line:

http://github.mtv.cloudera.com/CDH/Impala/blob/cdh5-trunk/thirdparty/thrift-0.9.0/lib/cpp/src/thrift/transport/TSocket.cpp#L531

which mean recv() returns 0.

The way readAll() is structured suggests that we should have reading non-zero length payload.
So one can only conclude based on the following line in the man page of recv():

When a stream socket peer has performed an orderly shutdown, the return value will be 0 (the
traditional "end-of-file" return).


http://gerrit.cloudera.org:8080/#/c/7284/8/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

PS8, Line 236: res.status
> now that we try o get the connection inside the loop, this could be uniniti
Yes, the default constructor for TStatus sets the error code to 0 (TErroCode::OK). May be
it's safer to explicitly initialize it.


Line 244:     Cancel();
> this Cancel() still results in IMPALA-5576, no? I'm not saying we should do
This is the same behavior as in previous releases. We used to cancel the fragments if we failed
the RPC:

https://github.com/apache/incubator-impala/blob/branch-2.8.0/be/src/service/fragment-exec-state.cc#L125


-- 
To view, visit http://gerrit.cloudera.org:8080/7284
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4d722c8ad3bf0e78e89887b6cb286c69ca61b8f5
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Juan Yu <jyu@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message