incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paulo Motta <pauloricard...@gmail.com>
Subject Re: Best version to upgrade from 1.1.10 to 1.2.X
Date Thu, 03 Oct 2013 18:03:26 GMT
This is the log after enabling TRACE on
org.apache.cassandra.net.OutboundTcpConnection:

DEBUG [WRITE-/54.215.70.YY] 2013-10-03 18:01:50,237
OutboundTcpConnection.java (line 338) Target max version is -2147483648; no
version information yet, will retry
TRACE [HANDSHAKE-/10.177.14.XX] 2013-10-03 18:01:50,237
OutboundTcpConnection.java (line 406) Cannot handshake version with
/10.177.14.XX
java.nio.channels.AsynchronousCloseException
 at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185)
 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:272)
at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:176)
 at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:86)
 at java.io.InputStream.read(InputStream.java:82)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:64)
 at java.io.DataInputStream.readInt(DataInputStream.java:370)
        at
org.apache.cassandra.net.OutboundTcpConnection$1.run(OutboundTcpConnection.java:400)


2013/10/3 Paulo Motta <pauloricardomg@gmail.com>

> Hello,
>
> During a rolling upgrade between 1.1.10 and 1.2.10, the newly upgrade
> nodes keep showing the following log message:
>
>  INFO [HANDSHAKE-/10.176.249.XX] 2013-10-03 17:36:16,948
> OutboundTcpConnection.java (line 399) Handshaking version with
> /10.176.249.XX
>  INFO [HANDSHAKE-/10.176.182.YY] 2013-10-03 17:36:17,280
> OutboundTcpConnection.java (line 408) Cannot handshake version with
> /10.176.182.YY
>  INFO [HANDSHAKE-/10.176.182.YY] 2013-10-03 17:36:17,280
> OutboundTcpConnection.java (line 399) Handshaking version with
> /10.176.182.YY
>  INFO [HANDSHAKE-/10.188.13.ZZ] 2013-10-03 17:36:17,510
> OutboundTcpConnection.java (line 408) Cannot handshake version with
> /10.188.13.ZZ
>  INFO [HANDSHAKE-/10.188.13.ZZ] 2013-10-03 17:36:17,511
> OutboundTcpConnection.java (line 399) Handshaking version with /10.188.13.ZZ
>
> Nodes XX, YY and ZZ are from the previous version (1.1.10). Is it expected
> they can't handshake or is this a potential problem?
>
> During reads to any cluster node they normally succeed, but sometimes I
> get read timeout errors. Has anyone had a similar issue?
>
> Cheers,
>
> Paulo
>
>
>
> 2013/10/2 Paulo Motta <pauloricardomg@gmail.com>
>
>> Nevermind the question. It was a firewall problem. Now the nodes between
>> different versions are able to see ach other! =)
>>
>> Cheers,
>>
>> Paulo
>>
>>
>> 2013/10/2 Paulo Motta <pauloricardomg@gmail.com>
>>
>>> Hello,
>>>
>>> I just started the rolling upgrade procedure from 1.1.10 to 2.1.10. Our
>>> strategy is to simultaneously upgrade one server from each replication
>>> group. So, if we have a 6 nodes with RF=2, we will upgrade 3 nodes at a
>>> time (from distinct replication groups).
>>>
>>> My question is: do the newly upgraded nodes show as "Down" in the
>>> "nodetool ring" of the old cluster (1.1.10)? Because I thought that network
>>> compatibility meant nodes from a newer version would receive traffic (write
>>> + reads) from the previous version without problems.
>>>
>>> Cheers,
>>>
>>> Paulo
>>>
>>>
>>> 2013/9/26 Paulo Motta <pauloricardomg@gmail.com>
>>>
>>>> Hello Charles,
>>>>
>>>> Thank you very much for your detailed upgrade report. It'll be very
>>>> helpful during our upgrade operation (even though we'll do a rolling
>>>> production upgrade).
>>>>
>>>> I'll also share our findings during the upgrade here.
>>>>
>>>> Cheers,
>>>>
>>>> Paulo
>>>>
>>>>
>>>> 2013/9/24 Charles Brophy <cbrophy@zulily.com>
>>>>
>>>>> Hi Paulo,
>>>>>
>>>>> I just completed a migration from 1.1.10 to 1.2.10 and it was
>>>>> surprisingly painless.
>>>>>
>>>>> The course of action that I took:
>>>>> 1) describe cluster - make sure all nodes are on the same schema
>>>>> 2) shutoff all maintenance tasks; i.e. make sure no scheduled repair
>>>>> is going to kick off in the middle of what you're doing
>>>>> 3) snapshot - maybe not necessary but it's so quick it makes no sense
>>>>> to skip this step
>>>>> 4) drain the nodes - I shut down the entire cluster rather than chance
>>>>> any incompatible gossip concerns that might come from a rolling upgrade.
I
>>>>> have the luxury of controlling both the providers and consumers of our
>>>>> data, so this wasn't so disruptive for us.
>>>>> 5) Upgrade the nodes, turn them on one-by-one, monitor the logs for
>>>>> funny business.
>>>>> 6) nodetool upgradesstables
>>>>> 7) Turn various maintenance tasks back on, etc.
>>>>>
>>>>> The worst part was managing the yaml/config changes between the
>>>>> versions. It wasn't horrible, but the diff was "noisier" than a more
>>>>> incremental upgrade typically is. A few things I recall that were special:
>>>>> 1) Since you have an existing cluster, you'll probably need to set the
>>>>> default partitioner back to RandomPartitioner in cassandra.yaml. I believe
>>>>> that is outlined in NEWS.
>>>>> 2) I set the initial tokens to be the same as what the nodes held
>>>>> previously.
>>>>> 3) The timeout is now divided into more atomic settings and you get to
>>>>> decided how (or if) to configure it from the default appropriately.
>>>>>
>>>>> tldr; I did a standard upgrade and payed careful attention to the
>>>>> NEWS.txt upgrade notices. I did a full cluster restart and NOT a rolling
>>>>> upgrade. It went without a hitch.
>>>>>
>>>>> Charles
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Sep 24, 2013 at 2:33 PM, Paulo Motta <pauloricardomg@gmail.com
>>>>> > wrote:
>>>>>
>>>>>> Cool, sounds fair enough. Thanks for the help, Rob!
>>>>>>
>>>>>> If anyone has upgraded from 1.1.X to 1.2.X, please feel invited to
>>>>>> share any tips on issues you're encountered that are not yet documented.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Paulo
>>>>>>
>>>>>>
>>>>>> 2013/9/24 Robert Coli <rcoli@eventbrite.com>
>>>>>>
>>>>>>> On Tue, Sep 24, 2013 at 1:41 PM, Paulo Motta <
>>>>>>> pauloricardomg@gmail.com> wrote:
>>>>>>>
>>>>>>>> Doesn't the probability of something going wrong increases
as the
>>>>>>>> gap between the versions increase? So, using this reasoning,
upgrading from
>>>>>>>> 1.1.10 to 1.2.6 would have less chance of something going
wrong then from
>>>>>>>> 1.1.10 to 1.2.9 or 1.2.10.
>>>>>>>>
>>>>>>>
>>>>>>> Sorta, but sorta not.
>>>>>>>
>>>>>>> https://github.com/apache/cassandra/blob/trunk/NEWS.txt
>>>>>>>
>>>>>>> Is the canonical source of concerns on upgrade. There are a few
>>>>>>> cases where upgrading to the "root" of X.Y.Z creates issues that
do not
>>>>>>> exist if you upgrade to the "head" of that line. AFAIK there
have been no
>>>>>>> cases where upgrading to the "head" of a line (where that line
is mature,
>>>>>>> like 1.2.10) has created problems which would have been avoided
by
>>>>>>> upgrading to the "root" first.
>>>>>>>
>>>>>>>
>>>>>>>> I'm hoping this reasoning is wrong and I can update directly
from
>>>>>>>> 1.1.10 to 1.2.10. :-)
>>>>>>>>
>>>>>>>
>>>>>>> That's what I plan to do when we move to 1.2.X, FWIW.
>>>>>>>
>>>>>>> =Rob
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Paulo Ricardo
>>>>>>
>>>>>> --
>>>>>> European Master in Distributed Computing***
>>>>>> Royal Institute of Technology - KTH
>>>>>> *
>>>>>> *Instituto Superior Técnico - IST*
>>>>>> *http://paulormg.com*
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Paulo Ricardo
>>>>
>>>> --
>>>> European Master in Distributed Computing***
>>>> Royal Institute of Technology - KTH
>>>> *
>>>> *Instituto Superior Técnico - IST*
>>>> *http://paulormg.com*
>>>>
>>>
>>>
>>>
>>> --
>>> Paulo Ricardo
>>>
>>> --
>>> European Master in Distributed Computing***
>>> Royal Institute of Technology - KTH
>>> *
>>> *Instituto Superior Técnico - IST*
>>> *http://paulormg.com*
>>>
>>
>>
>>
>> --
>> Paulo Ricardo
>>
>> --
>> European Master in Distributed Computing***
>> Royal Institute of Technology - KTH
>> *
>> *Instituto Superior Técnico - IST*
>> *http://paulormg.com*
>>
>
>
>
> --
> Paulo Ricardo
>
> --
> European Master in Distributed Computing***
> Royal Institute of Technology - KTH
> *
> *Instituto Superior Técnico - IST*
> *http://paulormg.com*
>



-- 
Paulo Ricardo

-- 
European Master in Distributed Computing***
Royal Institute of Technology - KTH
*
*Instituto Superior Técnico - IST*
*http://paulormg.com*

Mime
View raw message