cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Brown (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5393) Add an Ack/Retry for merkle tree sending
Date Thu, 18 Apr 2013 00:03:16 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13634649#comment-13634649
] 

Jason Brown commented on CASSANDRA-5393:
----------------------------------------

At the end of the day, this is what I see happening:

{code}INFO [AntiEntropyStage:1] 2013-03-27 22:48:55,390 AntiEntropyService.java (line 239)
repair #80fe25a0-9730-11e2-0000-ebe7011631ff Sending completed merkle tree to /54.246.XXX.YYY
for (Geo,GeoCountryMetadata)
DEBUG [WRITE-/54.246.XXX.YYY] 2013-03-27 22:48:55,392 OutboundTcpConnection.java (line 165)
error writing to ec2-54-246-XXX.YYY.eu-west-1.compute.amazonaws.com/54.246.XXX.YYY
java.net.SocketException: Connection timed out
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at com.sun.net.ssl.internal.ssl.OutputRecord.writeBuffer(OutputRecord.java:358)
at com.sun.net.ssl.internal.ssl.OutputRecord.write(OutputRecord.java:346)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:781)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:753)
at com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:100)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:104)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
at org.apache.cassandra.net.OutboundTcpConnection.write(OutboundTcpConnection.java:200)
at org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:152)
at org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:126)
{code}

The interesting thing is the "Connection timed out" exception message, rather than socket
reset (or something similar). So, I'm thinking this might be to keepalive timing out after
the connection is broken. I was able to reproduce this exception several times by having my
test cluster setup in three ec2 regions (us-west-2, us-east-1, eu-west-1 - three nodes in
each), and not sending any traffic for multiple hours. Basically, I'm waiting for the connection
to get dropped. Thus, when I went to triggered repair on one of the nodes (usu. starting with
us-west-2), I could see where the eu-west-1 nodes would get the request to build the merkle
tree, but then failed on sending the tree response with the above exception. I was able to
get similar problems when trying a schema update after many hours of cluster idleness.

The attached patch catches the exception when the socket is dead (for whatever reason), and
attempts a simple retry by requeueing the message at the end of the backlog queue, with the
hope that the next pass will successfully recreate the socket. Note that I'm excluding MessagingService.DROPPABLE_VERBS
from retries as it's OK to drop reads/mutates, but it's really those AES and other schema-related
messages that I think we'd want to retry.

Admittedly this is a simple mechanism that doesn't try to do anything fancy like exponential
backoff, n-levels of configurable retrys, and so on. I'm open to discussion on that, but I'm
not sure how much complexity we'd want to build in for that at this point. I think an incremental
improvement would go a long way here as we're currently obscuring when messages can't be sent
(which is OK for DROPPABLE_VERBS, but those other ones are ones are really important), so
added visibility and a retry mechanism will help. 


 
                
> Add an Ack/Retry for merkle tree sending
> ----------------------------------------
>
>                 Key: CASSANDRA-5393
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5393
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jeremiah Jordan
>            Assignee: Jason Brown
>         Attachments: 5393.patch
>
>
> Can we add an Ack/Retry around passing merle tree's around in repair?  If the following
fails, the repair hangs for ever on the coordinating node.
> https://github.com/apache/cassandra/blob/cassandra-1.1.10/src/java/org/apache/cassandra/service/AntiEntropyService.java#L242
> {noformat}
>             Message message = TreeResponseVerbHandler.makeVerb(local, validator);
>             if (!validator.request.endpoint.equals(FBUtilities.getBroadcastAddress()))
>                 logger.info(String.format("[repair #%s] Sending completed merkle tree
to %s for %s", validator.request.sessionid, validator.request.endpoint, validator.request.cf));
>             ms.sendOneWay(message, validator.request.endpoint);
> {noformat}
> If the message asking for merkle tree's gets lost, coordinating node hangs for ever as
well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message