cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Schuller (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-3166) Rolling upgrades from 0.7 to 0.8 not possible
Date Fri, 09 Sep 2011 16:53:09 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101327#comment-13101327
] 

Peter Schuller commented on CASSANDRA-3166:
-------------------------------------------

I'm having difficulty coming up with a clean yet simple fix here. Reverting CASSANDRA-2860
certainly fixes this problem, but re-introduces CASSANDRA-2860 instead.

I could imagine an environment variable/config option to disable the support for pretending
you are older than you are, which could be used in a second round of rolling restarts after
upgrading all nodes of a cluster to 0.8. A JMX tweakable setting would be nice, but upon changing
it you'd want to tear down all the TCP connections to re-initiate versioning negotiation so
maybe it's okay to leave it with an extra round of restarts required.

Alternatively, I think (not tested) things will tend to sort itself out incrementally every
time you restart a 0.8 node since it will tend to initiate connections to other nodes immediately,
but documenting for users that they need to restart nodes all over the place until everyone
seems to have gotten it seems like a poor solution.

Adding some new kind of message that says "i really am this other version" or similar isn't
clean.

Am I missing a much simpler and cleaner fix here?


> Rolling upgrades from 0.7 to 0.8 not possible
> ---------------------------------------------
>
>                 Key: CASSANDRA-3166
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3166
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.5, 0.7.9, 0.8.4
>            Reporter: Marcus Eriksson
>
> We are in the progress of upgrading to 0.8 and we need to do a rolling upgrade, this
fails miserably and it is reproducible;
> 1. set up a 3 node cluster with 0.7.9 and rf=3, read and write, QUORUM
> 2. upgrade one of the nodes (i upped a seednode, not sure if that is important)
> 3. continue reading/writing
> 4. see logs on the 0.7 node fill up with: INFO 12:36:08,240 Received connection from
newer protocol version. Ignorning message.
> it does work if i start the 0.7.9 nodes *after* the 0.8.4 node which makes me think that
it matters if it is the 0.8 node connecting to the 0.7 nodes or the other way round.
> Debug logging on the 0.8 node shows:
> /var/log/cassandra/system.log.9:DEBUG [pool-2-thread-82] 2011-09-09 11:55:06,067 StorageProxy.java
(line 178) Write timeout java.util.concurrent.TimeoutException for one (or more) of: 
> /var/log/cassandra/system.log.9:DEBUG [pool-2-thread-76] 2011-09-09 11:55:06,067 StorageProxy.java
(line 584) Read timeout: java.util.concurrent.TimeoutException: Operation timed out - received
only 1 responses from /193.182.3.92,  .
> nothing except for the "newer protocol version..." in the 0.7-logs
> i will continue to look at this issue but if anyone has a quick patch, let me know

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message