cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8621) For streaming operations, when a socket is closed/reset, we should retry/reinitiate that stream
Date Tue, 21 Jul 2015 16:57:05 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635393#comment-14635393
] 

Paulo Motta commented on CASSANDRA-8621:
----------------------------------------

I'd like to discuss/validate a possible solution before diving into implementation.

Upon receiving a SocketException during a stablished StreamSession, the reconnection initiator
will:
# Mark its view of the StreamSession as "isReconnecting";
# Stop/close both incoming and outgoing message handlers and respective sockets;
#* Since the closing of sockets might generate additional SocketExceptions, we may ignore/log
them while "isReconnecting" is set to true.
# Create new incoming and outgoing message handlers and sockets.
# Send a StreamInitMessage to the session peer with "isReconnecting" flag set to true.
# After the initialization is complete, the "StreamSession.isReconnecting" flag is set to
false and the onInitializationComplete() is called to resume the streaming protocol.
# In case of failure during the process, the initiator will retry to stablish the connection
up to max_streaming_retries property, and fail the stream session if it's not able to reconnect.

Upon receiving a StreamInitMessage with "isReconnecting=true" the reconnection follower will:
# Fetch the StreamSession object for that session: 
#* If StreamSession.isReconnecting is set to true on the reconnection follower, it means that
peer is also trying to act as a reconnection initiator, so we have a conflict. We can use
the node identifier or IP as a universal tie-breaker. Only the peer with the lowest IP/ID
will have it's StreamInitMessage accepted by the other peer in case of a conflict. The other
peer will have its init socket closed.
#* Otherwise, it will set its StreamSession.isReconnecting flag to true.
# Stop/close both incoming and outgoing message handlers and respective sockets;
#* Since the closing of sockets might generate additional SocketExceptions, we may ignore
them while "isReconnecting" is set to true.
# Create new incoming and outgoing message handlers and sockets.
# Attach the outgoing socket to the new outgoing message handler.
# After the incoming socket is attached to the incoming message handler, the session is restablished
and the "StreamSession.isReconnecting" is set to false.
# The session is restablished and everybody is happy.

What do you think of this approach [~yukim]?

> For streaming operations, when a socket is closed/reset, we should retry/reinitiate that
stream
> -----------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8621
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8621
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jeremy Hanna
>            Assignee: Paulo Motta
>
> Currently we have a setting (streaming_socket_timeout_in_ms) that will timeout and retry
the stream operation in the case where tcp is idle for a period of time.  However in the case
where the socket is closed or reset, we do not retry the operation.  This can happen for a
number of reasons, including when a firewall sends a reset message on a socket during a streaming
operation, such as nodetool rebuild necessarily across DCs or repairs.
> Doing a retry would make the streaming operations more resilient.  It would be good to
log the retry clearly as well (with the stream session ID and node address).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message