incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paulo Motta <pauloricard...@gmail.com>
Subject Re: Increased read timeouts during rolling upgrade to C* 1.2
Date Fri, 04 Oct 2013 18:25:27 GMT
One more piece of information to help troubleshooting the issue:

During the "nodetool drain" operation just before the upgrade, instead of
just stopping accepting new writes, the node actually shuts itself down.
This bug was also reported in this other thread:
http://mail-archives.apache.org/mod_mbox/cassandra-user/201303.mbox/%3CCAFDWQMTrYm7hBxXKoW8+eVKfNE6zvjW2h8_BSVGmOL7=gRDtLw@mail.gmail.com%3E

Since I started Cassandra 1.2 only a few seconds before cassandra 1.1 died
(after the nodetool drain), I'm afraid there wasn't sufficient time for the
remaining nodes to update the metadata about the "downed" node. So when the
upgraded node was restarted, the metadata in the other nodes was still
referring to the previous version of the same node, so this may have caused
the handshake problem, and consequently the read timeout. Does that theory
make sense?


2013/10/4 Robert Coli <rcoli@eventbrite.com>

> On Fri, Oct 4, 2013 at 9:09 AM, Paulo Motta <pauloricardomg@gmail.com>wrote:
>
>> I manually tried to insert and retrieve some data into both the newly
>> upgraded nodes and the old nodes, and the behavior was very unstable:
>> sometimes it worked, sometimes it didn't (TimedOutException), so I don't
>> think it was a network problem.
>>
>> The number of read timeouts diminished as the number of upgraded nodes
>> increased, until it reached stability. The logs were showing the following
>> messages periodically:
>>
>> ...
>
>> Two similar issues were reported, but without satisfactory responses:
>>
>> -
>> http://stackoverflow.com/questions/15355115/rolling-upgrade-for-cassandra-1-0-9-cluster-to-1-2-1
>> - https://issues.apache.org/jira/browse/CASSANDRA-5740
>>
>
> Both of these issues relate to upgrading from 1._0_.x to 1.2.x, which is
> not supported.
>
> Were I you, I would summarize the above experience in a JIRA ticket, as
> 1.1.x to 1.2.x should be a supported operation and should not unexpectedly
> result in decreased availability during the upgrade.
>
> =Rob
>



-- 
Paulo Ricardo

-- 
European Master in Distributed Computing***
Royal Institute of Technology - KTH
*
*Instituto Superior T├ęcnico - IST*
*http://paulormg.com*

Mime
View raw message