incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis Kot <denis....@monterosa.co.uk>
Subject Cassandra 1.2: old node does not want to re-join the ring
Date Mon, 26 Aug 2013 12:39:05 GMT
Hello,

We have Cassandra's cluster of 6 nodes, 3 seeds. One day AWS sent us a
message that one of our instance will be decommissioned and this was
seed01. To fix this we should simply stop/start instance to move it to new
AWS host. Before stop/start we did:
2) Stop gossip
3) Stop thrift
4) Drain
5) Stop Cassandra 6) Move all data to ebs (we using ephemeral volumes for
data)
7) Stop / Start instance
8) Move data back
9) Start Cassandra

But after starting cassandra on seed01 nodetool status shows:

Datacenter: UNKNOWN-DC
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens  Owns (effective)  Host ID
                       Rack
DN  10.149.45.115  ?          256     17.3%
ae4166fb-76e1-4900-947c-7e87ca262ea0  UNKNOWN-RACK
DN  10.164.84.171  ?          256     17.5%
638dae19-a6f5-4330-9466-f46ddb3b9d79  UNKNOWN-RACK
DN  10.149.44.215  ?          256     16.2%
987914af-f057-4922-8ee1-2a999108c75d  UNKNOWN-RACK
DN  10.232.20.72   ?          256     14.8%
fb5dfd50-de9e-42ed-b539-bd937a045992  UNKNOWN-RACK
DN  10.166.37.188  ?          256     17.1%
f149c294-ca1d-427c-b510-2f91a0966b5a  UNKNOWN-RACK
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens  Owns (effective)  Host   ID
                         Rack
UN  10.232.17.19   1020.87 MB  256     17.1%
08055af6-5dfa-4d4e-aa72-cf1d2952e23e  1b

we also tried to launch seed04 with seed02 and seed03 as seeds in the
config, but it creates new ring instead of joining existing.

We checked port 7000 on all nodes and this port is reachable for all nodes.
By default we opened all ports (TCP/UDP 0-65535) for same security groups
where all nodes live. In tcpdump I see that it tries to connect to seed:

08:43:42.056115 IP 10.235.62.198.45163 > 10.164.84.171.7000: Flags
[P.], seq 0:8, ack 1, win 46, options [nop,nop,TS val 81748069 ecr
538805526], length 8
08:43:42.056146 IP 10.164.84.171.7000 > 10.235.62.198.45163: Flags
[R], seq 110766787, win 0, length 0
08:43:42.157893 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags
[S], seq 452519826, win 5840, options [mss 1460,sackOK,TS val 81748094
ecr 0,nop,wscale 7], length 0
08:43:42.157903 IP 10.164.84.171.7000 > 10.235.62.198.45165: Flags
[S.], seq 4035182025, ack 452519827, win 5792, options [mss
1460,sackOK,TS val 538833931 ecr   81748094,nop,wscale 7], length 0
08:43:42.158920 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags
[.], ack 1, win 46, options [nop,nop,TS val 81748094 ecr 538833931],
length 0
08:43:42.159053 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags
[P.], seq 1:9, ack 1, win 46, options [nop,nop,TS val 81748094 ecr
538833931], length 8
08:43:42.360086 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags
[P.], seq 1:9, ack 1, win 46, options [nop,nop,TS val 81748145 ecr
538833931], length 8
08:43:42.768080 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags
[P.], seq 1:9, ack 1, win 46, options [nop,nop,TS val 81748247 ecr
538833931], length 8
08:43:43.584072 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags
[P.], seq 1:9, ack 1, win 46, options [nop,nop,TS val 81748451 ecr
538833931], length 8
08:43:45.216087 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags
[P.], seq 1:9, ack 1, win 46, options [nop,nop,TS val 81748859 ecr
538833931], length 8
08:43:45.783333 IP 10.164.84.171.7000 > 10.235.62.198.45165: Flags
[S.], seq 4035182025, ack 452519827, win 5792, options [mss
1460,sackOK,TS val 538834838 ecr 81748859,nop,wscale 7], length 0
08:43:45.784337 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags
[.], ack 1, win 46, options [nop,nop,TS val 81749001 ecr
538834838,nop,nop,sack 1 {0:1}], length 0

where 10.235.62.198 new node and 10.164.84.171 is seed

We use cassandra version 1.2.6 with vnodes.

Please help. We spent almost 3 days trying to fix it with no luck.


-- 
**

*Denis Kot // DevOps Engineer // Monterosa*

Mime
View raw message