drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Parth Chandra <par...@apache.org>
Subject Re: Native C++ Drill client handshake recovery
Date Fri, 09 Jun 2017 16:30:03 GMT
see inline responses

On Thu, Jun 8, 2017 at 4:47 PM, Ralph Little <rlittle@inetco.com> wrote:

> Hi,
> I have been looking into an issue where the native client hangs after
> failing to handshake during connection.
> We have some boundary conditions where a connection is attempted while the
> drill service is starting resulting in handshake failures.
> --
> DrillClient::connect() ends up ultimately in DrillClientImpl::recvHandshake
> ().
> This function calls async_read() with a callback to
> DrillClientImpl::handleHandshake to handle the results of handshaking.
> However, on error, DrillClientImpl::handleHandshake ends up calling
> handleConnError() which merely calls shutdownSocket() to kill the socket
> and set m_bIsConnected to false.
> When all that unfolds,  DrillClientImpl::validateHandshake() ends up
> returning CONN_SUCCESS which is clearly wrong because the handshake failed.
> The original caller to DrillClient::connect() thinks everything is
> hunky-dorey.

Yes, that would be a problem. From what I remember, the recvHandshake call
blocks in m_ioservice.run. On return from run, the recvHandshake checks if
the error object m_pError is not null. m_pError is not null iff there has
been an error. Do you see this not working correctly?

> The following added to DrillClientImpl::recvHandshake() after the
> m_io_service.run() line to check the result seems to fix the problem:
>     if (!m_bIsConnected)
>     {
>         return CONN_HANDSHAKE_FAILED;
>     }
Certainly if it addresses the problem, then this is a perfectly good fix.

> ...but I wonder if there is a better solution.
Also, although there is the IsActive() member that applications can call to
> determine if the connection is up, I wonder if the front-line drill calls
> (such as submitQuery()) could check the connection status a matter of
> course.

AFAIK, the IsActive method is an artifact of some old initial
implementation and I'm not sure if it is used much.

> Currently, if you attempt a submitQuery() call when the connection is
> down, it just hangs because m_io_service is not running and m_deadlineTimer
> never triggers as a fall back.
> Opinions?

It is a good idea to check connection status before sending any message to
the server. LMK if you want to submit a patch :), I can review and merge it

> Cheers,
> Ralph Little

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message