cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergio Bossa (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5692) Race condition in detecting version on a mixed 1.1/1.2 cluster
Date Thu, 27 Jun 2013 20:07:20 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13695016#comment-13695016
] 

Sergio Bossa commented on CASSANDRA-5692:
-----------------------------------------

bq. From my reading of the CDL javadoc, the IE is thrown "if the current thread is interrupted
while waiting". This is indicates that the waiting thread, the one calling versionLatch.await
in your code, is being interrupted, not the Handshake thread.

Correct. That was to account for spurious interrupts, but I'm fine with adhering to conventions
and throwing AE.

bq. Also, as a minor nit, I moved the versionLatch.countDown() into a finally block, as we
want to unblock the waiting thread regardless of success or failure to read the version from
the socket.

It would have go unblocked after the timeout, but actually better to fail/unblock as fast
as possible :)
                
> Race condition in detecting version on a mixed 1.1/1.2 cluster
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-5692
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5692
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.9, 1.2.5
>            Reporter: Sergio Bossa
>            Priority: Minor
>         Attachments: 5692-0005.patch, 5692-0006.patch
>
>
> On a mixed 1.1 / 1.2 cluster, starting 1.2 nodes fires sometimes a race condition in
version detection, where the 1.2 node wrongly detects version 6 for a 1.1 node.
> It works as follows:
> 1) The just started 1.2 node quickly opens an OutboundTcpConnection toward a 1.1 node
before receiving any messages from the latter.
> 2) Given the version is correctly detected only when the first message is received, the
version is momentarily set at 6.
> 3) This opens an OutboundTcpConnection from 1.2 to 1.1 at version 6, which gets stuck
in the connect() method.
> Later, the version is correctly fixed, but all outbound connections from 1.2 to 1.1 are
stuck at this point.
> Evidence from 1.2 logs:
> TRACE 13:48:31,133 Assuming current protocol version for /127.0.0.2
> DEBUG 13:48:37,837 Setting version 5 for /127.0.0.2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message