incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arya Goudarzi <gouda...@gmail.com>
Subject Re: Incompatible Gossip 1.1.6 to 1.2.1 Upgrade?
Date Thu, 28 Mar 2013 20:02:07 GMT
There has been a little misunderstanding. When all nodes are 1.2.2, they
are fine. But during the rolling upgrade, 1.2.2 nodes see 1.1.10 nodes as
down in nodetool command despite gossip reporting NORMAL. I will give your
suggestion a try and wil report back.

On Sat, Mar 23, 2013 at 10:37 AM, aaron morton <aaron@thelastpickle.com>wrote:

> So all nodes are 1.2 and some are still being marked as down ?
>
> I would try a rolling restart with -Dcassandra.load_ring_state=false added
> as a JVM _OPT in cassandra-env.sh. There is no guarantee it will fix it,
> but it's a simple thing to try.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 22/03/2013, at 10:30 AM, Arya Goudarzi <goudarzi@gmail.com> wrote:
>
> I took Brandon's suggestion in CASSANDRA-5332 and upgraded to 1.1.10
> before upgrading to 1.2.2 but the issue with nodetool ring reporting
> machines as down did not resolve.
>
> On Fri, Mar 15, 2013 at 6:35 PM, Arya Goudarzi <goudarzi@gmail.com> wrote:
>
>> Thank you very much Aaron. I recall from the logs of this upgraded node
>> to 1.2.2 reported seeing others as dead. Brandon suggested in
>> https://issues.apache.org/jira/browse/CASSANDRA-5332 that I should at
>> least upgrade from 1.1.7. So, I decided to try upgrading to 1.1.10 first
>> before upgrading to 1.2.2. I am in the middle of troubleshooting some other
>> issues I had with that upgrade (posted separately), once I am done, I will
>> give your suggestion a try.
>>
>>
>> On Mon, Mar 11, 2013 at 10:34 PM, aaron morton <aaron@thelastpickle.com>wrote:
>>
>>> > Is this just a display bug in nodetool or this upgraded node really
>>> sees the other ones as dead?
>>> Is the 1.2.2 node which is see all the others as down processing
>>> requests ?
>>> Is it showing the others as down in the log ?
>>>
>>> I'm not really sure what's happening. But you can try starting the 1.2.2
>>> node with the
>>>
>>> -Dcassandra.load_ring_state=false
>>>
>>> parameter, append it at the bottom of the cassandra-env.sh file. It will
>>> force the node to get the ring state from the others.
>>>
>>> Cheers
>>>
>>> -----------------
>>> Aaron Morton
>>> Freelance Cassandra Consultant
>>> New Zealand
>>>
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 8/03/2013, at 10:24 PM, Arya Goudarzi <goudarzi@gmail.com> wrote:
>>>
>>> > OK. I upgraded one node from 1.1.6 to 1.2.2 today. Despite some new
>>> problems that I had and I posted them in a separate email, this issue still
>>> exists but now it is only on 1.2.2 node. This means that the nodes running
>>> 1.1.6 see all other nodes including 1.2.2 as Up. Here is the ring and
>>> gossip from nodes with 1.1.6 for example. Bold denotes upgraded node:
>>> >
>>> > Address         DC          Rack        Status State   Load
>>>  Effective-Ownership Token
>>> >
>>>                      141784319550391026443072753098378663700
>>> > XX.180.36    us-east     1b          Up     Normal  49.47 GB
>>>  25.00%              1808575600
>>> > XX.231.121  us-east     1c          Up     Normal  47.08 GB
>>>  25.00%              7089215977519551322153637656637080005
>>> > XX.177.177  us-east     1d          Up     Normal  33.64 GB
>>>  25.00%              14178431955039102644307275311465584410
>>> > XX.7.148    us-east     1b          Up     Normal  41.27 GB
>>>  25.00%              42535295865117307932921825930779602030
>>> > XX.20.9     us-east     1c          Up     Normal  38.51 GB
>>>  25.00%              49624511842636859255075463585608106435
>>> > XX.86.255    us-east     1d          Up     Normal  34.78 GB
>>>  25.00%              56713727820156410577229101240436610840
>>> > XX.63.230    us-east     1b          Up     Normal  38.11 GB
>>>  25.00%              85070591730234615865843651859750628460
>>> > XX.163.36   us-east     1c          Up     Normal  44.25 GB
>>>  25.00%              92159807707754167187997289514579132865
>>> > XX.31.234    us-east     1d          Up     Normal  44.66 GB
>>>  25.00%              99249023685273718510150927169407637270
>>> > XX.132.169   us-east     1b          Up     Normal  44.2 GB
>>> 25.00%              127605887595351923798765477788721654890
>>> > XX.71.63     us-east     1c          Up     Normal  38.74 GB
>>>  25.00%              134695103572871475120919115443550159295
>>> > XX.197.209  us-east     1d          Up     Normal  41.5 GB
>>> 25.00%              141784319550391026443072753098378663700
>>> >
>>> > /XX.71.63
>>> >   RACK:1c
>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>> >   LOAD:4.1598705272E10
>>> >   DC:us-east
>>> >   INTERNAL_IP:XX.194.92
>>> >   STATUS:NORMAL,134695103572871475120919115443550159295
>>> >   RPC_ADDRESS:XX.194.92
>>> >   RELEASE_VERSION:1.1.6
>>> > /XX.86.255
>>> >   RACK:1d
>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>> >   LOAD:3.734334162E10
>>> >   DC:us-east
>>> >   INTERNAL_IP:XX.6.195
>>> >   STATUS:NORMAL,56713727820156410577229101240436610840
>>> >   RPC_ADDRESS:XX.6.195
>>> >   RELEASE_VERSION:1.1.6
>>> > /XX.7.148
>>> >   RACK:1b
>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>> >   LOAD:4.4316975808E10
>>> >   DC:us-east
>>> >   INTERNAL_IP:XX.47.250
>>> >   STATUS:NORMAL,42535295865117307932921825930779602030
>>> >   RPC_ADDRESS:XX.47.250
>>> >   RELEASE_VERSION:1.1.6
>>> > /XX.63.230
>>> >   RACK:1b
>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>> >   LOAD:4.0918593305E10
>>> >   DC:us-east
>>> >   INTERNAL_IP:XX.89.127
>>> >   STATUS:NORMAL,85070591730234615865843651859750628460
>>> >   RPC_ADDRESS:XX.89.127
>>> >   RELEASE_VERSION:1.1.6
>>> > /XX.132.169
>>> >   RACK:1b
>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>> >   LOAD:4.745883458E10
>>> >   DC:us-east
>>> >   INTERNAL_IP:XX.94.161
>>> >   STATUS:NORMAL,127605887595351923798765477788721654890
>>> >   RPC_ADDRESS:XX.94.161
>>> >   RELEASE_VERSION:1.1.6
>>> > /XX.180.36
>>> >   RACK:1b
>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>> >   LOAD:5.311963027E10
>>> >   DC:us-east
>>> >   INTERNAL_IP:XX.123.112
>>> >   STATUS:NORMAL,1808575600
>>> >   RPC_ADDRESS:XX.123.112
>>> >   RELEASE_VERSION:1.1.6
>>> > /XX.163.36
>>> >   RACK:1c
>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>> >   LOAD:4.7516755022E10
>>> >   DC:us-east
>>> >   INTERNAL_IP:XX.163.180
>>> >   STATUS:NORMAL,92159807707754167187997289514579132865
>>> >   RPC_ADDRESS:XX.163.180
>>> >   RELEASE_VERSION:1.1.6
>>> > /XX.31.234
>>> >   RACK:1d
>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>> >   LOAD:4.7954372912E10
>>> >   DC:us-east
>>> >   INTERNAL_IP:XX.192.159
>>> >   STATUS:NORMAL,99249023685273718510150927169407637270
>>> >   RPC_ADDRESS:XX.192.159
>>> >   RELEASE_VERSION:1.1.6
>>> > /XX.197.209
>>> >   RACK:1d
>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>> >   LOAD:4.4558968005E10
>>> >   DC:us-east
>>> >   INTERNAL_IP:XX.66.205
>>> >   STATUS:NORMAL,141784319550391026443072753098378663700
>>> >   RPC_ADDRESS:XX.66.205
>>> >   RELEASE_VERSION:1.1.6
>>> > /XX.177.177
>>> >   RACK:1d
>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>> >   LOAD:3.6115572697E10
>>> >   DC:us-east
>>> >   INTERNAL_IP:XX.65.57
>>> >   STATUS:NORMAL,14178431955039102644307275311465584410
>>> >   RPC_ADDRESS:XX.65.57
>>> >   RELEASE_VERSION:1.1.6
>>> > /XX.20.9
>>> >   RACK:1c
>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>> >   LOAD:4.1352503882E10
>>> >   DC:us-east
>>> >   INTERNAL_IP:XX.33.229
>>> >   STATUS:NORMAL,49624511842636859255075463585608106435
>>> >   RPC_ADDRESS:XX.33.229
>>> >   RELEASE_VERSION:1.1.6
>>> > /XX.231.121
>>> >   RACK:1c
>>> >   SCHEMA:09487aa5-3380-33ab-b9a5-bcc8476066b0
>>> >   X4:9c765678-d058-4d85-a588-638ce10ff984
>>> >   X3:7
>>> >   DC:us-east
>>> >   INTERNAL_IP:XX.223.241
>>> >   RPC_ADDRESS:XX.223.241
>>> >   RELEASE_VERSION:1.2.2
>>> >
>>> > Now the nodetool on the 1.2.2 node shows all nodes as Down but itself.
>>> Gossipinfo looks gook though:
>>> >
>>> > Datacenter: us-east
>>> > ==========
>>> > Replicas: 3
>>> >
>>> > Address         Rack        Status State   Load            Owns
>>>          Token
>>> >
>>>          56713727820156410577229101240436610840
>>> > XX.132.169   1b          Down   Normal  44.2 GB         25.00%
>>>      127605887595351923798765477788721654890
>>> > XX.7.148    1b          Down   Normal  41.27 GB        25.00%
>>>      42535295865117307932921825930779602030
>>> > XX.180.36    1b          Down   Normal  49.47 GB        25.00%
>>>      1808575600
>>> > XX.63.230    1b          Down   Normal  38.11 GB        25.00%
>>>      85070591730234615865843651859750628460
>>> > XX.231.121  1c          Up     Normal  47.25 GB        25.00%
>>>      7089215977519551322153637656637080005
>>> > XX.71.63     1c          Down   Normal  38.74 GB        25.00%
>>>      134695103572871475120919115443550159295
>>> > XX.177.177  1d          Down   Normal  33.64 GB        25.00%
>>>      14178431955039102644307275311465584410
>>> > XX.31.234    1d          Down   Normal  44.66 GB        25.00%
>>>      99249023685273718510150927169407637270
>>> > XX.20.9     1c          Down   Normal  38.51 GB        25.00%
>>>      49624511842636859255075463585608106435
>>> > XX.163.36   1c          Down   Normal  44.25 GB        25.00%
>>>      92159807707754167187997289514579132865
>>> > XX.197.209  1d          Down   Normal  41.5 GB         25.00%
>>>      141784319550391026443072753098378663700
>>> > XX.86.255    1d          Down   Normal  34.78 GB        25.00%
>>>      56713727820156410577229101240436610840
>>> >
>>> > /XX.71.63
>>> >   RACK:1c
>>> >   RPC_ADDRESS:XX.194.92
>>> >   RELEASE_VERSION:1.1.6
>>> >   INTERNAL_IP:XX.194.92
>>> >   STATUS:NORMAL,134695103572871475120919115443550159295
>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>> >   DC:us-east
>>> >   LOAD:4.1598705272E10
>>> > /XX.86.255
>>> >   RACK:1d
>>> >   RPC_ADDRESS:XX.6.195
>>> >   RELEASE_VERSION:1.1.6
>>> >   INTERNAL_IP:XX.6.195
>>> >   STATUS:NORMAL,56713727820156410577229101240436610840
>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>> >   DC:us-east
>>> >   LOAD:3.7343205002E10
>>> > /XX.7.148
>>> >   RACK:1b
>>> >   RPC_ADDRESS:XX.47.250
>>> >   RELEASE_VERSION:1.1.6
>>> >   INTERNAL_IP:XX.47.250
>>> >   STATUS:NORMAL,42535295865117307932921825930779602030
>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>> >   DC:us-east
>>> >   LOAD:4.4316975808E10
>>> > /XX.63.230
>>> >   RACK:1b
>>> >   RPC_ADDRESS:XX.89.127
>>> >   RELEASE_VERSION:1.1.6
>>> >   INTERNAL_IP:XX.89.127
>>> >   STATUS:NORMAL,85070591730234615865843651859750628460
>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>> >   DC:us-east
>>> >   LOAD:4.0918456687E10
>>> > /XX.132.169
>>> >   RACK:1b
>>> >   RPC_ADDRESS:XX.94.161
>>> >   RELEASE_VERSION:1.1.6
>>> >   INTERNAL_IP:XX.94.161
>>> >   STATUS:NORMAL,127605887595351923798765477788721654890
>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>> >   DC:us-east
>>> >   LOAD:4.745883458E10
>>> > /XX.180.36
>>> >   RACK:1b
>>> >   RPC_ADDRESS:XX.123.112
>>> >   RELEASE_VERSION:1.1.6
>>> >   INTERNAL_IP:XX.123.112
>>> >   STATUS:NORMAL,1808575600
>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>> >   DC:us-east
>>> >   LOAD:5.311963027E10
>>> > /XX.163.36
>>> >   RACK:1c
>>> >   RPC_ADDRESS:XX.163.180
>>> >   RELEASE_VERSION:1.1.6
>>> >   INTERNAL_IP:XX.163.180
>>> >   STATUS:NORMAL,92159807707754167187997289514579132865
>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>> >   DC:us-east
>>> >   LOAD:4.7516755022E10
>>> > /XX.31.234
>>> >   RACK:1d
>>> >   RPC_ADDRESS:XX.192.159
>>> >   RELEASE_VERSION:1.1.6
>>> >   INTERNAL_IP:XX.192.159
>>> >   STATUS:NORMAL,99249023685273718510150927169407637270
>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>> >   DC:us-east
>>> >   LOAD:4.7954372912E10
>>> > /XX.197.209
>>> >   RACK:1d
>>> >   RPC_ADDRESS:XX.66.205
>>> >   RELEASE_VERSION:1.1.6
>>> >   INTERNAL_IP:XX.66.205
>>> >   STATUS:NORMAL,141784319550391026443072753098378663700
>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>> >   DC:us-east
>>> >   LOAD:4.4559013211E10
>>> > /XX.177.177
>>> >   RACK:1d
>>> >   RPC_ADDRESS:XX.65.57
>>> >   RELEASE_VERSION:1.1.6
>>> >   INTERNAL_IP:XX.65.57
>>> >   STATUS:NORMAL,14178431955039102644307275311465584410
>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>> >   DC:us-east
>>> >   LOAD:3.6115572697E10
>>> > /XX.20.9
>>> >   RACK:1c
>>> >   RPC_ADDRESS:XX.33.229
>>> >   RELEASE_VERSION:1.1.6
>>> >   INTERNAL_IP:XX.33.229
>>> >   STATUS:NORMAL,49624511842636859255075463585608106435
>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>> >   DC:us-east
>>> >   LOAD:4.1352367264E10
>>> > /XX.231.121
>>> >   HOST_ID:9c765678-d058-4d85-a588-638ce10ff984
>>> >   RACK:1c
>>> >   RPC_ADDRESS:XX.223.241
>>> >   RELEASE_VERSION:1.2.2
>>> >   INTERNAL_IP:XX.223.241
>>> >   STATUS:NORMAL,7089215977519551322153637656637080005
>>> >   NET_VERSION:7
>>> >   SCHEMA:8b8948f5-d56f-3a96-8005-b9452e42cd67
>>> >   SEVERITY:0.0
>>> >   DC:us-east
>>> >   LOAD:5.0710624207E10
>>> >
>>> > Is this just a display bug in nodetool or this upgraded node really
>>> sees the other ones as dead?
>>> >
>>> > -Arya
>>> >
>>> >
>>> > On Mon, Feb 25, 2013 at 8:10 PM, Arya Goudarzi <goudarzi@gmail.com>
>>> wrote:
>>> > No I did not look at nodetool gossipinfo but from the ring on both
>>> pre-upgrade and post upgrade nodes to 1.2.1, what I observed was the
>>> described behavior.
>>> >
>>> >
>>> > On Sat, Feb 23, 2013 at 1:26 AM, Michael Kjellman <
>>> mkjellman@barracuda.com> wrote:
>>> > This was a bug with 1.2.0 but resolved in 1.2.1. Did you take a
>>> capture of nodetool gossipinfo and nodetool ring by chance?
>>> >
>>> > On Feb 23, 2013, at 12:26 AM, "Arya Goudarzi" <goudarzi@gmail.com>
>>> wrote:
>>> >
>>> > > Hi C* users,
>>> > >
>>> > > I just upgrade a 12 node test cluster from 1.1.6 to 1.2.1. What I
>>> noticed from nodetool ring was that the new upgraded nodes only saw each
>>> other as Normal and the rest of the cluster which was on 1.1.6 as Down.
>>> Vise versa was true for the nodes running 1.1.6. They saw each other as
>>> Normal but the 1.2.1 nodes as down. I don't see a note in upgrade docs that
>>> this would be an issue. Has anyone else observed this problem?
>>> > >
>>> > > In the debug logs I could see messages saying attempting to connect
>>> to node IP and then saying it is down.
>>> > >
>>> > > Cheers,
>>> > > -Arya
>>> >
>>> > Copy, by Barracuda, helps you store, protect, and share all your
>>> amazing
>>> >
>>> > things. Start today: www.copy.com.
>>> >
>>> >
>>>
>>>
>>
>
>

Mime
View raw message