zookeeper-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (Jira)" <j...@apache.org>
Subject [jira] [Updated] (ZOOKEEPER-3384) Avoid long quorum unavailable time due to TLS connection close stalled with full send buffer
Date Wed, 11 Sep 2019 05:40:00 GMT

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

ASF GitHub Bot updated ZOOKEEPER-3384:
    Labels: pull-request-available  (was: )

> Avoid long quorum unavailable time due to TLS connection close stalled with full send
> --------------------------------------------------------------------------------------------
>                 Key: ZOOKEEPER-3384
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3384
>             Project: ZooKeeper
>          Issue Type: Sub-task
>          Components: server
>            Reporter: Fangmin Lv
>            Assignee: Fangmin Lv
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.6.0
> *Problem*
> For SSL socket, when calling close(), it is required to send a close_notify alert before
closing the write side of the connection. In case the leader is partitioned away, it's possible
that the learner shutdown may take long time if the send buffer is full, because it will
block on sending close_notify packet.
> From the SSLSocketImpl implementation, it still honors the SO_LINGER socket option, the
difference is that even we set the SO_LINGER time to be 0 it will still try to issue the close_notify
packet. But it will fail immediately and close the socket if it failed to acquire the write
lock immediately.
> Set SO_LINGER to a small number will avoid stall for a long time during shutdown, this
is what we're going to do here.
> *Any Cons of doing this?*
> From the TCP RFC, the close handshake is added to avoid a truncation attack where an
attacker inserts into a message a TCP code indicating the message has finished, thus preventing
the recipient picking up the rest of the message. But it's fine if the peer didn't send close_notify
in some cases, for example, the client crashed or being killed, etc. For us, usually the close_notify
won't be and don't have chance to send during rolling restart.
> Another thing mentioned in the RFC is that not able to send close_notify will cause the
SSL session not able to be resume. Given reusable session id is not benefiting ZooKeeper
quorum anyway, this is not a problem for us.

This message was sent by Atlassian Jira

View raw message