Aaron,

I added -Dcassandra.load_ring_state=false in the cassandra-env.sh and did a rolling restart With one node in 1.2.3 version and 11 other nodes in 1.1.10, the 1.1.10 nodes saw 1.2.3 node but now the gossip on 1.2.3 only sees itself. 

Cheers,
-Arya


On Thu, Mar 28, 2013 at 1:02 PM, Arya Goudarzi <goudarzi@gmail.com> wrote:
There has been a little misunderstanding. When all nodes are 1.2.2, they are fine. But during the rolling upgrade, 1.2.2 nodes see 1.1.10 nodes as down in nodetool command despite gossip reporting NORMAL. I will give your suggestion a try and wil report back.


On Sat, Mar 23, 2013 at 10:37 AM, aaron morton <aaron@thelastpickle.com> wrote:
So all nodes are 1.2 and some are still being marked as down ? 

I would try a rolling restart with -Dcassandra.load_ring_state=false added as a JVM _OPT in cassandra-env.sh. There is no guarantee it will fix it, but it's a simple thing to try. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton

On 22/03/2013, at 10:30 AM, Arya Goudarzi <goudarzi@gmail.com> wrote:

I took Brandon's suggestion in CASSANDRA-5332 and upgraded to 1.1.10 before upgrading to 1.2.2 but the issue with nodetool ring reporting machines as down did not resolve. 

On Fri, Mar 15, 2013 at 6:35 PM, Arya Goudarzi <goudarzi@gmail.com> wrote:
Thank you very much Aaron. I recall from the logs of this upgraded node to 1.2.2 reported seeing others as dead. Brandon suggested in https://issues.apache.org/jira/browse/CASSANDRA-5332 that I should at least upgrade from 1.1.7. So, I decided to try upgrading to 1.1.10 first before upgrading to 1.2.2. I am in the middle of troubleshooting some other issues I had with that upgrade (posted separately), once I am done, I will give your suggestion a try.


On Mon, Mar 11, 2013 at 10:34 PM, aaron morton <aaron@thelastpickle.com> wrote:
> Is this just a display bug in nodetool or this upgraded node really sees the other ones as dead?
Is the 1.2.2 node which is see all the others as down processing requests ?
Is it showing the others as down in the log ?

I'm not really sure what's happening. But you can try starting the 1.2.2 node with the

-Dcassandra.load_ring_state=false

parameter, append it at the bottom of the cassandra-env.sh file. It will force the node to get the ring state from the others.

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 8/03/2013, at 10:24 PM, Arya Goudarzi <goudarzi@gmail.com> wrote:

> OK. I upgraded one node from 1.1.6 to 1.2.2 today. Despite some new problems that I had and I posted them in a separate email, this issue still exists but now it is only on 1.2.2 node. This means that the nodes running 1.1.6 see all other nodes including 1.2.2 as Up. Here is the ring and gossip from nodes with 1.1.6 for example. Bold denotes upgraded node:
>
> Address         DC          Rack        Status State   Load            Effective-Ownership Token
>                                                                                            141784319550391026443072753098378663700
> XX.180.36    us-east     1b          Up     Normal  49.47 GB        25.00%              1808575600
> XX.231.121  us-east     1c          Up     Normal  47.08 GB        25.00%              7089215977519551322153637656637080005
> XX.177.177  us-east     1d          Up     Normal  33.64 GB        25.00%              14178431955039102644307275311465584410
> XX.7.148    us-east     1b          Up     Normal  41.27 GB        25.00%              42535295865117307932921825930779602030
> XX.20.9     us-east     1c          Up     Normal  38.51 GB        25.00%              49624511842636859255075463585608106435
> XX.86.255    us-east     1d          Up     Normal  34.78 GB        25.00%              56713727820156410577229101240436610840
> XX.63.230    us-east     1b          Up     Normal  38.11 GB        25.00%              85070591730234615865843651859750628460
> XX.163.36   us-east     1c          Up     Normal  44.25 GB        25.00%              92159807707754167187997289514579132865
> XX.31.234    us-east     1d          Up     Normal  44.66 GB        25.00%              99249023685273718510150927169407637270
> XX.132.169   us-east     1b          Up     Normal  44.2 GB         25.00%              127605887595351923798765477788721654890
> XX.71.63     us-east     1c          Up     Normal  38.74 GB        25.00%              134695103572871475120919115443550159295
> XX.197.209  us-east     1d          Up     Normal  41.5 GB         25.00%              141784319550391026443072753098378663700
>
> /XX.71.63
>   RACK:1c
>   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>   LOAD:4.1598705272E10
>   DC:us-east
>   INTERNAL_IP:XX.194.92
>   STATUS:NORMAL,134695103572871475120919115443550159295
>   RPC_ADDRESS:XX.194.92
>   RELEASE_VERSION:1.1.6
> /XX.86.255
>   RACK:1d
>   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>   LOAD:3.734334162E10
>   DC:us-east
>   INTERNAL_IP:XX.6.195
>   STATUS:NORMAL,56713727820156410577229101240436610840
>   RPC_ADDRESS:XX.6.195
>   RELEASE_VERSION:1.1.6
> /XX.7.148
>   RACK:1b
>   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>   LOAD:4.4316975808E10
>   DC:us-east
>   INTERNAL_IP:XX.47.250
>   STATUS:NORMAL,42535295865117307932921825930779602030
>   RPC_ADDRESS:XX.47.250
>   RELEASE_VERSION:1.1.6
> /XX.63.230
>   RACK:1b
>   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>   LOAD:4.0918593305E10
>   DC:us-east
>   INTERNAL_IP:XX.89.127
>   STATUS:NORMAL,85070591730234615865843651859750628460
>   RPC_ADDRESS:XX.89.127
>   RELEASE_VERSION:1.1.6
> /XX.132.169
>   RACK:1b
>   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>   LOAD:4.745883458E10
>   DC:us-east
>   INTERNAL_IP:XX.94.161
>   STATUS:NORMAL,127605887595351923798765477788721654890
>   RPC_ADDRESS:XX.94.161
>   RELEASE_VERSION:1.1.6
> /XX.180.36
>   RACK:1b
>   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>   LOAD:5.311963027E10
>   DC:us-east
>   INTERNAL_IP:XX.123.112
>   STATUS:NORMAL,1808575600
>   RPC_ADDRESS:XX.123.112
>   RELEASE_VERSION:1.1.6
> /XX.163.36
>   RACK:1c
>   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>   LOAD:4.7516755022E10
>   DC:us-east
>   INTERNAL_IP:XX.163.180
>   STATUS:NORMAL,92159807707754167187997289514579132865
>   RPC_ADDRESS:XX.163.180
>   RELEASE_VERSION:1.1.6
> /XX.31.234
>   RACK:1d
>   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>   LOAD:4.7954372912E10
>   DC:us-east
>   INTERNAL_IP:XX.192.159
>   STATUS:NORMAL,99249023685273718510150927169407637270
>   RPC_ADDRESS:XX.192.159
>   RELEASE_VERSION:1.1.6
> /XX.197.209
>   RACK:1d
>   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>   LOAD:4.4558968005E10
>   DC:us-east
>   INTERNAL_IP:XX.66.205
>   STATUS:NORMAL,141784319550391026443072753098378663700
>   RPC_ADDRESS:XX.66.205
>   RELEASE_VERSION:1.1.6
> /XX.177.177
>   RACK:1d
>   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>   LOAD:3.6115572697E10
>   DC:us-east
>   INTERNAL_IP:XX.65.57
>   STATUS:NORMAL,14178431955039102644307275311465584410
>   RPC_ADDRESS:XX.65.57
>   RELEASE_VERSION:1.1.6
> /XX.20.9
>   RACK:1c
>   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>   LOAD:4.1352503882E10
>   DC:us-east
>   INTERNAL_IP:XX.33.229
>   STATUS:NORMAL,49624511842636859255075463585608106435
>   RPC_ADDRESS:XX.33.229
>   RELEASE_VERSION:1.1.6
> /XX.231.121
>   RACK:1c
>   SCHEMA:09487aa5-3380-33ab-b9a5-bcc8476066b0
>   X4:9c765678-d058-4d85-a588-638ce10ff984
>   X3:7
>   DC:us-east
>   INTERNAL_IP:XX.223.241
>   RPC_ADDRESS:XX.223.241
>   RELEASE_VERSION:1.2.2
>
> Now the nodetool on the 1.2.2 node shows all nodes as Down but itself. Gossipinfo looks gook though:
>
> Datacenter: us-east
> ==========
> Replicas: 3
>
> Address         Rack        Status State   Load            Owns                Token
>                                                                                56713727820156410577229101240436610840
> XX.132.169   1b          Down   Normal  44.2 GB         25.00%              127605887595351923798765477788721654890
> XX.7.148    1b          Down   Normal  41.27 GB        25.00%              42535295865117307932921825930779602030
> XX.180.36    1b          Down   Normal  49.47 GB        25.00%              1808575600
> XX.63.230    1b          Down   Normal  38.11 GB        25.00%              85070591730234615865843651859750628460
> XX.231.121  1c          Up     Normal  47.25 GB        25.00%              7089215977519551322153637656637080005
> XX.71.63     1c          Down   Normal  38.74 GB        25.00%              134695103572871475120919115443550159295
> XX.177.177  1d          Down   Normal  33.64 GB        25.00%              14178431955039102644307275311465584410
> XX.31.234    1d          Down   Normal  44.66 GB        25.00%              99249023685273718510150927169407637270
> XX.20.9     1c          Down   Normal  38.51 GB        25.00%              49624511842636859255075463585608106435
> XX.163.36   1c          Down   Normal  44.25 GB        25.00%              92159807707754167187997289514579132865
> XX.197.209  1d          Down   Normal  41.5 GB         25.00%              141784319550391026443072753098378663700
> XX.86.255    1d          Down   Normal  34.78 GB        25.00%              56713727820156410577229101240436610840
>
> /XX.71.63
>   RACK:1c
>   RPC_ADDRESS:XX.194.92
>   RELEASE_VERSION:1.1.6
>   INTERNAL_IP:XX.194.92
>   STATUS:NORMAL,134695103572871475120919115443550159295
>   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>   DC:us-east
>   LOAD:4.1598705272E10
> /XX.86.255
>   RACK:1d
>   RPC_ADDRESS:XX.6.195
>   RELEASE_VERSION:1.1.6
>   INTERNAL_IP:XX.6.195
>   STATUS:NORMAL,56713727820156410577229101240436610840
>   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>   DC:us-east
>   LOAD:3.7343205002E10
> /XX.7.148
>   RACK:1b
>   RPC_ADDRESS:XX.47.250
>   RELEASE_VERSION:1.1.6
>   INTERNAL_IP:XX.47.250
>   STATUS:NORMAL,42535295865117307932921825930779602030
>   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>   DC:us-east
>   LOAD:4.4316975808E10
> /XX.63.230
>   RACK:1b
>   RPC_ADDRESS:XX.89.127
>   RELEASE_VERSION:1.1.6
>   INTERNAL_IP:XX.89.127
>   STATUS:NORMAL,85070591730234615865843651859750628460
>   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>   DC:us-east
>   LOAD:4.0918456687E10
> /XX.132.169
>   RACK:1b
>   RPC_ADDRESS:XX.94.161
>   RELEASE_VERSION:1.1.6
>   INTERNAL_IP:XX.94.161
>   STATUS:NORMAL,127605887595351923798765477788721654890
>   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>   DC:us-east
>   LOAD:4.745883458E10
> /XX.180.36
>   RACK:1b
>   RPC_ADDRESS:XX.123.112
>   RELEASE_VERSION:1.1.6
>   INTERNAL_IP:XX.123.112
>   STATUS:NORMAL,1808575600
>   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>   DC:us-east
>   LOAD:5.311963027E10
> /XX.163.36
>   RACK:1c
>   RPC_ADDRESS:XX.163.180
>   RELEASE_VERSION:1.1.6
>   INTERNAL_IP:XX.163.180
>   STATUS:NORMAL,92159807707754167187997289514579132865
>   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>   DC:us-east
>   LOAD:4.7516755022E10
> /XX.31.234
>   RACK:1d
>   RPC_ADDRESS:XX.192.159
>   RELEASE_VERSION:1.1.6
>   INTERNAL_IP:XX.192.159
>   STATUS:NORMAL,99249023685273718510150927169407637270
>   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>   DC:us-east
>   LOAD:4.7954372912E10
> /XX.197.209
>   RACK:1d
>   RPC_ADDRESS:XX.66.205
>   RELEASE_VERSION:1.1.6
>   INTERNAL_IP:XX.66.205
>   STATUS:NORMAL,141784319550391026443072753098378663700
>   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>   DC:us-east
>   LOAD:4.4559013211E10
> /XX.177.177
>   RACK:1d
>   RPC_ADDRESS:XX.65.57
>   RELEASE_VERSION:1.1.6
>   INTERNAL_IP:XX.65.57
>   STATUS:NORMAL,14178431955039102644307275311465584410
>   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>   DC:us-east
>   LOAD:3.6115572697E10
> /XX.20.9
>   RACK:1c
>   RPC_ADDRESS:XX.33.229
>   RELEASE_VERSION:1.1.6
>   INTERNAL_IP:XX.33.229
>   STATUS:NORMAL,49624511842636859255075463585608106435
>   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>   DC:us-east
>   LOAD:4.1352367264E10
> /XX.231.121
>   HOST_ID:9c765678-d058-4d85-a588-638ce10ff984
>   RACK:1c
>   RPC_ADDRESS:XX.223.241
>   RELEASE_VERSION:1.2.2
>   INTERNAL_IP:XX.223.241
>   STATUS:NORMAL,7089215977519551322153637656637080005
>   NET_VERSION:7
>   SCHEMA:8b8948f5-d56f-3a96-8005-b9452e42cd67
>   SEVERITY:0.0
>   DC:us-east
>   LOAD:5.0710624207E10
>
> Is this just a display bug in nodetool or this upgraded node really sees the other ones as dead?
>
> -Arya
>
>
> On Mon, Feb 25, 2013 at 8:10 PM, Arya Goudarzi <goudarzi@gmail.com> wrote:
> No I did not look at nodetool gossipinfo but from the ring on both pre-upgrade and post upgrade nodes to 1.2.1, what I observed was the described behavior.
>
>
> On Sat, Feb 23, 2013 at 1:26 AM, Michael Kjellman <mkjellman@barracuda.com> wrote:
> This was a bug with 1.2.0 but resolved in 1.2.1. Did you take a capture of nodetool gossipinfo and nodetool ring by chance?
>
> On Feb 23, 2013, at 12:26 AM, "Arya Goudarzi" <goudarzi@gmail.com> wrote:
>
> > Hi C* users,
> >
> > I just upgrade a 12 node test cluster from 1.1.6 to 1.2.1. What I noticed from nodetool ring was that the new upgraded nodes only saw each other as Normal and the rest of the cluster which was on 1.1.6 as Down. Vise versa was true for the nodes running 1.1.6. They saw each other as Normal but the 1.2.1 nodes as down. I don't see a note in upgrade docs that this would be an issue. Has anyone else observed this problem?
> >
> > In the debug logs I could see messages saying attempting to connect to node IP and then saying it is down.
> >
> > Cheers,
> > -Arya
>
> Copy, by Barracuda, helps you store, protect, and share all your amazing
>
> things. Start today: www.copy.com.
>
>