cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Brown (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-5669) Connection thrashing in multi-region ec2 during upgrade, due to messaging version
Date Tue, 09 Jul 2013 12:27:48 GMT


Jason Brown commented on CASSANDRA-5669:

I spent a lot of time thinking about this :), and I think the situation in this ticket is
subtly different from what happened in CASSANDRA-5171. I commented on that ticket as to why
I think it had a problem (short answer: connecting to publicIP on non-SSL port). This ticket
does not get us into that situation as we will continue to connect to the publicIP/(SSL) port
- we simply bypass reconnecting on the local port if we see the other node has a lower messaging

I did test out this upgrade scenario a few weeks ago when we concocted it (and it worked),
and will be happy to try it out again. It'll take a few hours (including time for dropping
kids of at camp), so I'll update this ticket later in the morning.
> Connection thrashing in multi-region ec2 during upgrade, due to messaging version
> ---------------------------------------------------------------------------------
>                 Key: CASSANDRA-5669
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.2.5
>            Reporter: Jason Brown
>            Assignee: Jason Brown
>            Priority: Minor
>              Labels: ec2, ec2multiregionsnitch, gossip
>             Fix For: 1.2.6, 2.0 beta 1
>         Attachments: 5669-v1.diff, 5669-v2.diff
> While debugging the upgrading scenario described in CASSANDRA-5660, I discovered the
ITC.close() will reset the message protocol version of a peer node that disconnects. CASSANDRA-5660
has a full description of the upgrade path, but basically the Ec2MultiRegionSnitch will close
connections on the publicIP addr to reconnect on the privateIp, and this causes ITC to drop
the message protocol version of previously known nodes. I think we want to hang onto that
version so that when the newer node (re-)connects to the lower node version, it passes the
correct protocol version rather than the current version (too high for the older node),the
connection attempt getting dropped, and going through the dance again.
> To clarify, the 'thrashing' is at a rather low volume, from what I observed. Anecdotaly,
perhaps one connection per second gets turned over.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message