Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
MIME-Version: 1.0
References: <CALE39-eAOWdwCgyE2H5dB19uet41wKsDAaVykvQiDHzP_WEmag@mail.gmail.com>
In-Reply-To: <CALE39-eAOWdwCgyE2H5dB19uet41wKsDAaVykvQiDHzP_WEmag@mail.gmail.com>
From: Ben Slater <ben.slater@instaclustr.com>
Date: Mon, 17 Oct 2016 05:48:47 +0000
Message-ID: <CAKgYGaqxmR3+3ER20hS47Mj2a2xw5iLjqLJNE1gGWf_soLE+SA@mail.gmail.com>
Subject: Re: failure node rejoin
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=94eb2c08bdfad3d8a3053f092184
archived-at: Mon, 17 Oct 2016 05:49:05 -0000

--94eb2c08bdfad3d8a3053f092184
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

To cassandra, the node where you deleted the files looks like a brand new
machine. It doesn=E2=80=99t automatically rebuild machines to prevent accid=
ental
replacement. You need to tell it to build the =E2=80=9Cnew=E2=80=9D machine=
s as a
replacement for the =E2=80=9Cold=E2=80=9D machine with that IP by setting
-Dcassandra.replace_address_first_boot=3D<dead_node_ip>.
See http://cassandra.apache.org/doc/latest/operating/topo_changes.html.

Cheers
Ben

On Mon, 17 Oct 2016 at 16:41 Yuji Ito <yuji@imagine-orb.com> wrote:

> Hi all,
>
> A failure node can rejoin a cluster.
> On the node, all data in /var/lib/cassandra were deleted.
> Is it normal?
>
> I can reproduce it as below.
>
> cluster:
> - C* 2.2.7
> - a cluster has node1, 2, 3
> - node1 is a seed
> - replication_factor: 3
>
> how to:
> 1) stop C* process and delete all data in /var/lib/cassandra on node2
> ($sudo rm -rf /var/lib/cassandra/*)
> 2) stop C* process on node1 and node3
> 3) restart C* on node1
> 4) restart C* on node2
>
> nodetool status after 4):
> Datacenter: datacenter1
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> Status=3DUp/Down
> |/ State=3DNormal/Leaving/Joining/Moving
> --  Address        Load       Tokens       Owns (effective)  Host ID
>                         Rack
> DN  [node3 IP]  ?                 256          100.0%
>  325553c6-3e05-41f6-a1f7-47436743816f  rack1
> UN  [node2 IP]  7.76 MB      256          100.0%
>  05bdb1d4-c39b-48f1-8248-911d61935925  rack1
> UN  [node1 IP]  416.13 MB  256          100.0%
>  a8ec0a31-cb92-44b0-b156-5bcd4f6f2c7b  rack1
>
> If I restart C* on node 2 when C* on node1 and node3 are running (without
> 2), 3)), a runtime exception happens.
> RuntimeException: "A node with address [node2 IP] already exists,
> cancelling join..."
>
> I'm not sure this causes data lost. All data can be read properly just
> after this rejoin.
> But some rows are lost when I kill&restart C* for destructive tests after
> this rejoin.
>
> Thanks.
>
> --
=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94
Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798

--94eb2c08bdfad3d8a3053f092184
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">To cassandra, the node where you deleted the files looks l=
ike a brand new machine. It doesn=E2=80=99t automatically rebuild machines =
to prevent accidental replacement. You need to tell it to build the =E2=80=
=9Cnew=E2=80=9D machines as a replacement for the =E2=80=9Cold=E2=80=9D mac=
hine with that IP by setting=C2=A0<span style=3D"color:rgb(199,37,78);font-=
family:menlo,monaco,consolas,&#39;courier new&#39;,monospace;font-size:12.6=
px;font-variant-ligatures:normal;background-color:rgb(249,242,244)">-Dcassa=
ndra.replace_address_first_boot=3D&lt;dead_node_ip&gt;. See=C2=A0</span><a =
href=3D"http://cassandra.apache.org/doc/latest/operating/topo_changes.html"=
>http://cassandra.apache.org/doc/latest/operating/topo_changes.html</a>.<di=
v><br></div><div>Cheers</div><div>Ben</div></div><br><div class=3D"gmail_qu=
ote"><div dir=3D"ltr">On Mon, 17 Oct 2016 at 16:41 Yuji Ito &lt;<a href=3D"=
mailto:yuji@imagine-orb.com">yuji@imagine-orb.com</a>&gt; wrote:<br></div><=
blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px=
 #ccc solid;padding-left:1ex"><div dir=3D"ltr" class=3D"gmail_msg">Hi all,<=
div class=3D"gmail_msg"><br class=3D"gmail_msg"></div><div class=3D"gmail_m=
sg">A failure node can rejoin a cluster.</div><div class=3D"gmail_msg">On t=
he node, all data in /var/lib/cassandra were deleted.</div><div class=3D"gm=
ail_msg">Is it normal?</div><div class=3D"gmail_msg"><br class=3D"gmail_msg=
"></div><div class=3D"gmail_msg">I can reproduce it as below.</div><div cla=
ss=3D"gmail_msg"><br class=3D"gmail_msg"></div><div class=3D"gmail_msg">clu=
ster:</div><div class=3D"gmail_msg">- C* 2.2.7<br class=3D"gmail_msg"></div=
><div class=3D"gmail_msg">- a cluster has node1, 2, 3<br class=3D"gmail_msg=
"></div><div class=3D"gmail_msg">- node1 is a seed</div><div class=3D"gmail=
_msg">-=C2=A0replication_factor: 3</div><div class=3D"gmail_msg"><br class=
=3D"gmail_msg"></div><div class=3D"gmail_msg">how to:</div><div class=3D"gm=
ail_msg">1) stop C* process and delete all data in /var/lib/cassandra on no=
de2 ($sudo rm -rf /var/lib/cassandra/*)<br class=3D"gmail_msg"></div><div c=
lass=3D"gmail_msg">2) stop C* process on node1 and node3</div><div class=3D=
"gmail_msg">3) restart C* on node1</div><div class=3D"gmail_msg">4) restart=
 C* on node2</div><div class=3D"gmail_msg"><br class=3D"gmail_msg"></div><d=
iv class=3D"gmail_msg">nodetool status after 4):</div><div class=3D"gmail_m=
sg"><div class=3D"gmail_msg">Datacenter: datacenter1</div><div class=3D"gma=
il_msg">=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D</div><div class=3D"gmail_msg">Status=3DUp/Down</div><div class=3D"gmail=
_msg">|/ State=3DNormal/Leaving/Joining/Moving</div><div class=3D"gmail_msg=
">-- =C2=A0Address =C2=A0 =C2=A0 =C2=A0 =C2=A0Load =C2=A0 =C2=A0 =C2=A0 Tok=
ens =C2=A0 =C2=A0 =C2=A0 Owns (effective) =C2=A0Host ID =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 Rack</div><div class=3D"gmail_msg">DN =C2=A0[node3 IP] =C2=A0=
? =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 256 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0100.0% =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0325553=
c6-3e05-41f6-a1f7-47436743816f =C2=A0rack1</div><div class=3D"gmail_msg">UN=
 =C2=A0[node2 IP] =C2=A07.76 MB =C2=A0 =C2=A0 =C2=A0256 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0100.0% =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A005bdb1d4-c=
39b-48f1-8248-911d61935925 =C2=A0rack1</div><div class=3D"gmail_msg">UN =C2=
=A0[node1 IP] =C2=A0416.13 MB =C2=A0256 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01=
00.0% =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0a8ec0a31-cb92-44b0-b156-5bcd=
4f6f2c7b =C2=A0rack1</div></div><div class=3D"gmail_msg"><br class=3D"gmail=
_msg"></div><div class=3D"gmail_msg">If I restart C* on node 2 when C* on n=
ode1 and node3 are running (without 2), 3)), a runtime exception happens.</=
div><div class=3D"gmail_msg">RuntimeException: &quot;A node with address [n=
ode2 IP] already exists, cancelling join...&quot;</div><div class=3D"gmail_=
msg"><br class=3D"gmail_msg"></div><div class=3D"gmail_msg">I&#39;m not sur=
e this causes data lost. All data can be read properly=C2=A0just after this=
 rejoin.</div><div class=3D"gmail_msg">But some rows are lost when I kill&a=
mp;restart C* for destructive tests after this rejoin.</div><div class=3D"g=
mail_msg"><br class=3D"gmail_msg"></div><div class=3D"gmail_msg">Thanks.</d=
iv><div class=3D"gmail_msg"><br class=3D"gmail_msg"></div></div>
</blockquote></div><div dir=3D"ltr">-- <br></div><div data-smartmail=3D"gma=
il_signature"><div dir=3D"ltr">=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=
=94=E2=80=94=E2=80=94=E2=80=94<div>Ben Slater<div>Chief Product Officer</di=
v><div>Instaclustr: Cassandra + Spark - Managed | Consulting | Support</div=
><div>+61 437 929 798</div></div></div></div>

--94eb2c08bdfad3d8a3053f092184--