Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B3EE910234 for ; Mon, 26 Aug 2013 12:39:55 +0000 (UTC) Received: (qmail 22917 invoked by uid 500); 26 Aug 2013 12:39:53 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 22898 invoked by uid 500); 26 Aug 2013 12:39:52 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 22883 invoked by uid 99); 26 Aug 2013 12:39:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Aug 2013 12:39:51 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: softfail (athena.apache.org: transitioning domain of denis.kot@monterosa.co.uk does not designate 209.85.219.45 as permitted sender) Received: from [209.85.219.45] (HELO mail-oa0-f45.google.com) (209.85.219.45) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Aug 2013 12:39:46 +0000 Received: by mail-oa0-f45.google.com with SMTP id m1so742551oag.18 for ; Mon, 26 Aug 2013 05:39:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-gm-message-state:mime-version:from:date:message-id:subject:to :content-type; bh=MXfZwmspyYNJQUz8EtvdGQWRwnHeYQ0FJLaqRr1bqAU=; b=SQ5V7qPaa9VqM8VXGSMKsFZoBYzLOKNNNjU51vdG4fCTTmr2aC63Dd1Cp5VOdOj48n lolUDp3tSC+xYY3IIEAiNKEisdaC4TVRDO84/3+XyZKcAiTh8ORO0CmuGDcERay/sLWT EBf7uhkjco/n+XkHbYCBezg4nNJZ7G/c9Ad+X4GZXRaCuiSELwcveamdIteHh+aMgXoK krrFOXo2U4JME3xWOdfESr0ZtSawDQGxTrSz7QX+spQMRfw3olK8RyuUHMXj08s5q/Dg ACSjUdidD/Pgj4uEkSJdUsvqP0YjM3I3ty4IeoWpanoTTY0ibD2FtQyCJiOQw88EXbco 31aQ== X-Gm-Message-State: ALoCoQl8TR4WU9cvqF9XZIw0S4oga0+O2sOTk33apoMHTBawUFSyXRakwzb3Yg9/0+6E6AtXjmAz X-Received: by 10.182.18.102 with SMTP id v6mr133836obd.71.1377520765113; Mon, 26 Aug 2013 05:39:25 -0700 (PDT) MIME-Version: 1.0 Received: by 10.60.34.105 with HTTP; Mon, 26 Aug 2013 05:39:05 -0700 (PDT) From: Denis Kot Date: Mon, 26 Aug 2013 15:39:05 +0300 Message-ID: Subject: Cassandra 1.2: old node does not want to re-join the ring To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001a11c339e6e024b404e4d90b2c X-Virus-Checked: Checked by ClamAV on apache.org --001a11c339e6e024b404e4d90b2c Content-Type: text/plain; charset=ISO-8859-1 Hello, We have Cassandra's cluster of 6 nodes, 3 seeds. One day AWS sent us a message that one of our instance will be decommissioned and this was seed01. To fix this we should simply stop/start instance to move it to new AWS host. Before stop/start we did: 2) Stop gossip 3) Stop thrift 4) Drain 5) Stop Cassandra 6) Move all data to ebs (we using ephemeral volumes for data) 7) Stop / Start instance 8) Move data back 9) Start Cassandra But after starting cassandra on seed01 nodetool status shows: Datacenter: UNKNOWN-DC ====================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack DN 10.149.45.115 ? 256 17.3% ae4166fb-76e1-4900-947c-7e87ca262ea0 UNKNOWN-RACK DN 10.164.84.171 ? 256 17.5% 638dae19-a6f5-4330-9466-f46ddb3b9d79 UNKNOWN-RACK DN 10.149.44.215 ? 256 16.2% 987914af-f057-4922-8ee1-2a999108c75d UNKNOWN-RACK DN 10.232.20.72 ? 256 14.8% fb5dfd50-de9e-42ed-b539-bd937a045992 UNKNOWN-RACK DN 10.166.37.188 ? 256 17.1% f149c294-ca1d-427c-b510-2f91a0966b5a UNKNOWN-RACK Datacenter: us-east =================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.232.17.19 1020.87 MB 256 17.1% 08055af6-5dfa-4d4e-aa72-cf1d2952e23e 1b we also tried to launch seed04 with seed02 and seed03 as seeds in the config, but it creates new ring instead of joining existing. We checked port 7000 on all nodes and this port is reachable for all nodes. By default we opened all ports (TCP/UDP 0-65535) for same security groups where all nodes live. In tcpdump I see that it tries to connect to seed: 08:43:42.056115 IP 10.235.62.198.45163 > 10.164.84.171.7000: Flags [P.], seq 0:8, ack 1, win 46, options [nop,nop,TS val 81748069 ecr 538805526], length 8 08:43:42.056146 IP 10.164.84.171.7000 > 10.235.62.198.45163: Flags [R], seq 110766787, win 0, length 0 08:43:42.157893 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags [S], seq 452519826, win 5840, options [mss 1460,sackOK,TS val 81748094 ecr 0,nop,wscale 7], length 0 08:43:42.157903 IP 10.164.84.171.7000 > 10.235.62.198.45165: Flags [S.], seq 4035182025, ack 452519827, win 5792, options [mss 1460,sackOK,TS val 538833931 ecr 81748094,nop,wscale 7], length 0 08:43:42.158920 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags [.], ack 1, win 46, options [nop,nop,TS val 81748094 ecr 538833931], length 0 08:43:42.159053 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags [P.], seq 1:9, ack 1, win 46, options [nop,nop,TS val 81748094 ecr 538833931], length 8 08:43:42.360086 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags [P.], seq 1:9, ack 1, win 46, options [nop,nop,TS val 81748145 ecr 538833931], length 8 08:43:42.768080 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags [P.], seq 1:9, ack 1, win 46, options [nop,nop,TS val 81748247 ecr 538833931], length 8 08:43:43.584072 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags [P.], seq 1:9, ack 1, win 46, options [nop,nop,TS val 81748451 ecr 538833931], length 8 08:43:45.216087 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags [P.], seq 1:9, ack 1, win 46, options [nop,nop,TS val 81748859 ecr 538833931], length 8 08:43:45.783333 IP 10.164.84.171.7000 > 10.235.62.198.45165: Flags [S.], seq 4035182025, ack 452519827, win 5792, options [mss 1460,sackOK,TS val 538834838 ecr 81748859,nop,wscale 7], length 0 08:43:45.784337 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags [.], ack 1, win 46, options [nop,nop,TS val 81749001 ecr 538834838,nop,nop,sack 1 {0:1}], length 0 where 10.235.62.198 new node and 10.164.84.171 is seed We use cassandra version 1.2.6 with vnodes. Please help. We spent almost 3 days trying to fix it with no luck. -- ** *Denis Kot // DevOps Engineer // Monterosa* --001a11c339e6e024b404e4d90b2c Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

Hello,

We have Cassandra's cluster of 6 nodes, 3 s= eeds. One day AWS=20 sent us a message that one of our instance will be decommissioned and=20 this was seed01. To fix this we should simply stop/start instance to=20 move it to new AWS host. Before stop/start we did:
2) Stop gossip
3) Stop thrift
4) Drain
5) Stop Cassandra 6) Move all data to ebs (we using ephemeral volumes for data)
7) Stop / Start instance
8) Move data back
9) Start Cassandra

But after starting cassandra on seed01 nodetool status shows:

Datacenter: UNKNOWN-DC
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Status=3DUp/Down
|/ State=3DNormal/Leaving/Joining/Moving
--  Address        Load       Tokens  Owns (effective)  Host ID            =
                   Rack
DN  10.149.45.115  ?          256     17.3%             ae4166fb-76e1-4900-=
947c-7e87ca262ea0  UNKNOWN-RACK
DN  10.164.84.171  ?          256     17.5%             638dae19-a6f5-4330-=
9466-f46ddb3b9d79  UNKNOWN-RACK
DN  10.149.44.215  ?          256     16.2%             987914af-f057-4922-=
8ee1-2a999108c75d  UNKNOWN-RACK
DN  10.232.20.72   ?          256     14.8%             fb5dfd50-de9e-42ed-=
b539-bd937a045992  UNKNOWN-RACK
DN  10.166.37.188  ?          256     17.1%             f149c294-ca1d-427c-=
b510-2f91a0966b5a  UNKNOWN-RACK
Datacenter: us-east
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Status=3DUp/Down
|/ State=3DNormal/Leaving/Joining/Moving
--  Address        Load       Tokens  Owns (effective)  Host   ID          =
                     Rack
UN  10.232.17.19   1020.87 MB  256     17.1%             08055af6-5dfa-4d4e=
-aa72-cf1d2952e23e  1b

we also tried to launch seed04 with seed02 and seed03 as seeds in the config, but it creates new ring instead of joining existing.

We checked port 7000 on all nodes and this port is reachable for all=20 nodes. By default we opened all ports (TCP/UDP 0-65535) for same=20 security groups where all nodes live. In tcpdump I see that it tries to connect to seed:

08:43:42.056115 IP 10.235.62.198.45163 > 10.164.84.171.7000: =
Flags [P.], seq 0:8, ack 1, win 46, options [nop,nop,TS val 81748069 ecr 53=
8805526], length 8
08:43:42.056146 IP 10.164.84.171.7000 > 10.235.62.198.45163: Flags [R], =
seq 110766787, win 0, length 0
08:43:42.157893 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags [S], =
seq 452519826, win 5840, options [mss 1460,sackOK,TS val 81748094 ecr 0,nop=
,wscale 7], length 0
08:43:42.157903 IP 10.164.84.171.7000 > 10.235.62.198.45165: Flags [S.],=
 seq 4035182025, ack 452519827, win 5792, options [mss 1460,sackOK,TS val 5=
38833931 ecr   81748094,nop,wscale 7], length 0
08:43:42.158920 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags [.], =
ack 1, win 46, options [nop,nop,TS val 81748094 ecr 538833931], length 0
08:43:42.159053 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags [P.],=
 seq 1:9, ack 1, win 46, options [nop,nop,TS val 81748094 ecr 538833931], l=
ength 8
08:43:42.360086 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags [P.],=
 seq 1:9, ack 1, win 46, options [nop,nop,TS val 81748145 ecr 538833931], l=
ength 8
08:43:42.768080 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags [P.],=
 seq 1:9, ack 1, win 46, options [nop,nop,TS val 81748247 ecr 538833931], l=
ength 8
08:43:43.584072 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags [P.],=
 seq 1:9, ack 1, win 46, options [nop,nop,TS val 81748451 ecr 538833931], l=
ength 8
08:43:45.216087 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags [P.],=
 seq 1:9, ack 1, win 46, options [nop,nop,TS val 81748859 ecr 538833931], l=
ength 8
08:43:45.783333 IP 10.164.84.171.7000 > 10.235.62.198.45165: Flags [S.],=
 seq 4035182025, ack 452519827, win 5792, options [mss 1460,sackOK,TS val 5=
38834838 ecr 81748859,nop,wscale 7], length 0
08:43:45.784337 IP 10.235.62.198.45165 > 10.164.84.171.7000: Flags [.], =
ack 1, win 46, options [nop,nop,TS val 81749001 ecr 538834838,nop,nop,sack =
1 {0:1}], length 0

where 10.235.62.198 new node and 10.164.84.171 is seed

We use cassandra version 1.2.6 with vnodes.

Please help. We spent almost 3 days trying to fix it with no luck.


--

Denis Kot // = DevOps Engineer //=A0Monterosa

--001a11c339e6e024b404e4d90b2c--