Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
MIME-Version: 1.0
References: 
 <CAAZU44kWEuuQuAc+DPY+FZswrCvr43JS=TXZ=tc4v3oF6D_8UQ@mail.gmail.com>
 <CAAZU44kLhuTMeLz6jWs_1TK7GhVWgbxVHerGj1L=8RnaCoVV-A@mail.gmail.com>
 <B793F1CD-8A2E-4DEB-A0F3-D9C8F600A2C0@crowdstrike.com>
 <CAAZU44=MageE_h7KrHpWgaN=Qg018Bo=mMt1gXE3Pc==3EYf3Q@mail.gmail.com>
 <D7729978-093A-444C-AB24-9D8EC2F9F534@crowdstrike.com>
 <C38394D7-F5AD-4C1D-8978-F7D4D505F912@gmail.com>
In-Reply-To: <C38394D7-F5AD-4C1D-8978-F7D4D505F912@gmail.com>
From: Eric Stevens <mightye@gmail.com>
Date: Mon, 19 Oct 2015 15:48:49 +0000
Message-ID: 
 <CAORswtxd52qPa3JbppK+UUTG1_CAgFKFxpfAG5UosOs44NBynw@mail.gmail.com>
Subject: Re: Would we have data corruption if we bootstrapped 10 nodes at
 once?
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=001a113ec8946e2b8705227715cc

--001a113ec8946e2b8705227715cc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

It seems to me that as long as cleanup hasn't happened, if you
*decommission* the newly joined nodes, they'll stream whatever writes they
took back to the original replicas.  Presumably that should be pretty quick
as they won't have nearly as much data as the original nodes (as they only
hold data written while they were online).  Then as long as cleanup hasn't
happened, your cluster should have returned to a consistent view of the
data.  You can now bootstrap the new nodes again.

If you have done a cleanup, then the data is probably irreversibly
corrupted, you will have to figure out how to restore the missing data
incrementally from backups if they are available.

On Sun, Oct 18, 2015 at 10:37 PM Raj Chudasama <raj.chudasama@gmail.com>
wrote:

> In this can does it make sense to remove newly added nodes, correct the
> configuration and have them rejoin one at a time ?
>
> Thx
>
> Sent from my iPhone
>
> On Oct 18, 2015, at 11:19 PM, Jeff Jirsa <jeff.jirsa@crowdstrike.com>
> wrote:
>
> Take a snapshot now, before you get rid of any data (whatever you do,
> don=E2=80=99t run cleanup).
>
> If you identify missing data, you can go back to those snapshots, find th=
e
> nodes that had the data previously (sstable2json, for example), and eithe=
r
> re-stream that data into the cluster with sstableloader or copy it to a n=
ew
> host and `nodetool refresh` it into the new system.
>
>
>
> From: <burtonator2011@gmail.com> on behalf of Kevin Burton
> Reply-To: "user@cassandra.apache.org"
> Date: Sunday, October 18, 2015 at 8:10 PM
> To: "user@cassandra.apache.org"
> Subject: Re: Would we have data corruption if we bootstrapped 10 nodes at
> once?
>
> ouch.. OK.. I think I really shot myself in the foot here then.  This
> might be bad.
>
> I'm not sure if I would have missing data.  I mean basically the data is
> on the other nodes.. but the cluster has been running with 10 nodes
> accidentally bootstrapped with auto_bootstrap=3Dfalse.
>
> So they have new data and seem to be missing values.
>
> this is somewhat misleading... Initially if you start it up and run
> nodetool status , it only returns one node.
>
> So I assumed auto_bootstrap=3Dfalse meant that it just doesn't join the
> cluster.
>
> I'm running a nodetool repair now to hopefully fix this.
>
>
>
> On Sun, Oct 18, 2015 at 7:25 PM, Jeff Jirsa <jeff.jirsa@crowdstrike.com>
> wrote:
>
>> auto_bootstrap=3Dfalse tells it to join the cluster without running
>> bootstrap =E2=80=93 the node assumes it has all of the necessary data, a=
nd won=E2=80=99t
>> stream any missing data.
>>
>> This generally violates consistency guarantees, but if done on a single
>> node, is typically correctable with `nodetool repair`.
>>
>> If you do it on many  nodes at once, it=E2=80=99s possible that the new =
nodes
>> could represent all 3 replicas of the data, but don=E2=80=99t physically=
 have any
>> of that data, leading to missing records.
>>
>>
>>
>> From: <burtonator2011@gmail.com> on behalf of Kevin Burton
>> Reply-To: "user@cassandra.apache.org"
>> Date: Sunday, October 18, 2015 at 3:44 PM
>> To: "user@cassandra.apache.org"
>> Subject: Re: Would we have data corruption if we bootstrapped 10 nodes
>> at once?
>>
>> An shit.. I think we're seeing corruption.. missing records :-/
>>
>> On Sat, Oct 17, 2015 at 10:45 AM, Kevin Burton <burton@spinn3r.com>
>> wrote:
>>
>>> We just migrated from a 30 node cluster to a 45 node cluster. (so 15 ne=
w
>>> nodes)
>>>
>>> By default we have auto_boostrap =3D false
>>>
>>> so we just push our config to the cluster, the cassandra daemons
>>> restart, and they're not cluster members and are the only nodes in the
>>> cluster.
>>>
>>> Anyway.  While I was about 1/2 way done adding the 15 nodes,  I had
>>> about 7 members of the cluster and 8 not yet joined.
>>>
>>> We are only doing 1 at a time because apparently bootstrapping more tha=
n
>>> 1 is unsafe.
>>>
>>> I did a rolling restart whereby I went through and restarted all the
>>> cassandra boxes.
>>>
>>> Somehow the new nodes auto boostrapped themselves EVEN though
>>> auto_bootstrap=3Dfalse.
>>>
>>> We don't have any errors.  Everything seems functional.  I'm just
>>> worried about data loss.
>>>
>>> Thoughts?
>>>
>>> Kevin
>>>
>>> --
>>>
>>> We=E2=80=99re hiring if you know of any awesome Java Devops or Linux Op=
erations
>>> Engineers!
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> blog: http://burtonator.wordpress.com
>>> =E2=80=A6 or check out my Google+ profile
>>> <https://plus.google.com/102718274791889610666/posts>
>>>
>>>
>>
>>
>> --
>>
>> We=E2=80=99re hiring if you know of any awesome Java Devops or Linux Ope=
rations
>> Engineers!
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> =E2=80=A6 or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>>
>>
>
>
> --
>
> We=E2=80=99re hiring if you know of any awesome Java Devops or Linux Oper=
ations
> Engineers!
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> =E2=80=A6 or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
>
>

--001a113ec8946e2b8705227715cc
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">It seems to me that as long as cleanup hasn&#39;t happened=
, if you <b>decommission</b>=C2=A0the newly joined nodes, they&#39;ll strea=
m whatever writes they took back to the original replicas.=C2=A0 Presumably=
 that should be pretty quick as they won&#39;t have nearly as much data as =
the original nodes (as they only hold data written while they were online).=
=C2=A0 Then as long as cleanup hasn&#39;t happened, your cluster should hav=
e returned to a consistent view of the data.=C2=A0 You can now bootstrap th=
e new nodes again.<div><br></div><div>If you have done a cleanup, then the =
data is probably irreversibly corrupted, you will have to figure out how to=
 restore the missing data incrementally from backups if they are available.=
</div></div><br><div class=3D"gmail_quote"><div dir=3D"ltr">On Sun, Oct 18,=
 2015 at 10:37 PM Raj Chudasama &lt;<a href=3D"mailto:raj.chudasama@gmail.c=
om">raj.chudasama@gmail.com</a>&gt; wrote:<br></div><blockquote class=3D"gm=
ail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-le=
ft:1ex"><div dir=3D"auto"><div>In this can does it make sense to remove new=
ly added nodes, correct the configuration and have them rejoin one at a tim=
e ?</div><div><br></div><div>Thx</div><div><br>Sent from my iPhone</div></d=
iv><div dir=3D"auto"><div><br>On Oct 18, 2015, at 11:19 PM, Jeff Jirsa &lt;=
<a href=3D"mailto:jeff.jirsa@crowdstrike.com" target=3D"_blank">jeff.jirsa@=
crowdstrike.com</a>&gt; wrote:<br><br></div><blockquote type=3D"cite"><div>=
<div><div>Take a snapshot now, before you get rid of any data (whatever you=
 do, don=E2=80=99t run cleanup).=C2=A0</div><div><br></div><div>If you iden=
tify missing data, you can go back to those snapshots, find the nodes that =
had the data previously (sstable2json, for example), and either re-stream t=
hat data into the cluster with sstableloader or copy it to a new host and `=
nodetool refresh` it into the new system.</div><div><br></div><div><br></di=
v><div><div></div></div></div><div><br></div><span><div style=3D"font-famil=
y:Calibri;font-size:12pt;text-align:left;color:black;BORDER-BOTTOM:medium n=
one;BORDER-LEFT:medium none;PADDING-BOTTOM:0in;PADDING-LEFT:0in;PADDING-RIG=
HT:0in;BORDER-TOP:#b5c4df 1pt solid;BORDER-RIGHT:medium none;PADDING-TOP:3p=
t"><span style=3D"font-weight:bold">From: </span> &lt;<a href=3D"mailto:bur=
tonator2011@gmail.com" target=3D"_blank">burtonator2011@gmail.com</a>&gt; o=
n behalf of Kevin Burton<br><span style=3D"font-weight:bold">Reply-To: </sp=
an> &quot;<a href=3D"mailto:user@cassandra.apache.org" target=3D"_blank">us=
er@cassandra.apache.org</a>&quot;<br><span style=3D"font-weight:bold">Date:=
 </span> Sunday, October 18, 2015 at 8:10 PM<br><span style=3D"font-weight:=
bold">To: </span> &quot;<a href=3D"mailto:user@cassandra.apache.org" target=
=3D"_blank">user@cassandra.apache.org</a>&quot;<br><span style=3D"font-weig=
ht:bold">Subject: </span> Re: Would we have data corruption if we bootstrap=
ped 10 nodes at once?<br></div><div><br></div><div><div><div dir=3D"ltr">ou=
ch.. OK.. I think I really shot myself in the foot here then.=C2=A0 This mi=
ght be bad.
<div><br></div><div>I&#39;m not sure if I would have missing data.=C2=A0 I =
mean basically the data is on the other nodes.. but the cluster has been ru=
nning with 10 nodes accidentally bootstrapped with auto_bootstrap=3Dfalse. =
=C2=A0</div><div><br></div><div>So they have new data and seem to be missin=
g values.=C2=A0<br><div><br></div><div>this is somewhat misleading... Initi=
ally if you start it up and run nodetool status , it only returns one node.=
=C2=A0</div><div><br></div><div>So I assumed auto_bootstrap=3Dfalse meant t=
hat it just doesn&#39;t join the cluster.</div><div><br></div><div>I&#39;m =
running a nodetool repair now to hopefully fix this.</div><div><br></div><d=
iv><br></div></div></div><div class=3D"gmail_extra"><br><div class=3D"gmail=
_quote">On Sun, Oct 18, 2015 at 7:25 PM, Jeff Jirsa <span dir=3D"ltr">
&lt;<a href=3D"mailto:jeff.jirsa@crowdstrike.com" target=3D"_blank">jeff.ji=
rsa@crowdstrike.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quo=
te" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"=
><div style=3D"word-wrap:break-word;color:rgb(0,0,0);font-size:14px;font-fa=
mily:Calibri,sans-serif"><div><div><div>auto_bootstrap=3Dfalse tells it to =
join the cluster without running bootstrap =E2=80=93 the node assumes it ha=
s all of the necessary data, and won=E2=80=99t stream any missing data.</di=
v><div><br></div><div>This generally violates consistency guarantees, but i=
f done on a single node, is typically correctable with `nodetool repair`.</=
div><div><br></div><div>If you do it on many =C2=A0nodes at once, it=E2=80=
=99s possible that the new nodes could represent all 3 replicas of the data=
, but don=E2=80=99t physically have any of that data, leading to missing re=
cords.</div><div><br></div><div><br></div><div><div></div></div></div></div=
><div><br></div><span><div style=3D"font-family:Calibri;font-size:12pt;text=
-align:left;color:black;BORDER-BOTTOM:medium none;BORDER-LEFT:medium none;P=
ADDING-BOTTOM:0in;PADDING-LEFT:0in;PADDING-RIGHT:0in;BORDER-TOP:#b5c4df 1pt=
 solid;BORDER-RIGHT:medium none;PADDING-TOP:3pt"><span style=3D"font-weight=
:bold">From: </span>&lt;<a href=3D"mailto:burtonator2011@gmail.com" target=
=3D"_blank">burtonator2011@gmail.com</a>&gt; on behalf of Kevin Burton<br><=
span style=3D"font-weight:bold">Reply-To: </span>&quot;<a href=3D"mailto:us=
er@cassandra.apache.org" target=3D"_blank">user@cassandra.apache.org</a>&qu=
ot;<br><span style=3D"font-weight:bold">Date: </span>Sunday, October 18, 20=
15 at 3:44 PM<br><span style=3D"font-weight:bold">To: </span>&quot;<a href=
=3D"mailto:user@cassandra.apache.org" target=3D"_blank">user@cassandra.apac=
he.org</a>&quot;<br><span style=3D"font-weight:bold">Subject: </span>Re: Wo=
uld we have data corruption if we bootstrapped 10 nodes at once?<br></div><=
div><div><div><br></div><div><div><div dir=3D"ltr">An shit.. I think we&#39=
;re seeing corruption.. missing records :-/</div><div class=3D"gmail_extra"=
><br><div class=3D"gmail_quote">On Sat, Oct 17, 2015 at 10:45 AM, Kevin Bur=
ton <span dir=3D"ltr">
&lt;<a href=3D"mailto:burton@spinn3r.com" target=3D"_blank">burton@spinn3r.=
com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"mar=
gin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr=
">We just migrated from a 30 node cluster to a 45 node cluster. (so 15 new =
nodes)
<div><br></div><div>By default we have auto_boostrap =3D false</div><div><b=
r></div><div>so we just push our config to the cluster, the cassandra daemo=
ns restart, and they&#39;re not cluster members and are the only nodes in t=
he cluster.</div><div><br></div><div>Anyway.=C2=A0 While I was about 1/2 wa=
y done adding the 15 nodes, =C2=A0I had about 7 members of the cluster and =
8 not yet joined.</div><div><br></div><div>We are only doing 1 at a time be=
cause apparently bootstrapping more than 1 is unsafe. =C2=A0</div><div><br>=
</div><div>I did a rolling restart whereby I went through and restarted all=
 the cassandra boxes. =C2=A0</div><div><br></div><div>Somehow the new nodes=
 auto boostrapped themselves EVEN though auto_bootstrap=3Dfalse.</div><div>=
<br></div><div>We don&#39;t have any errors.=C2=A0 Everything seems functio=
nal.=C2=A0 I&#39;m just worried about data loss.</div><div><br></div><div>T=
houghts?</div><span><font color=3D"#888888"><div><br></div><div>Kevin</div>=
<div><div><br></div>
-- <br><div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr"><d=
iv><div><p style=3D"margin-top:0px;margin-right:0px;margin-bottom:12pt;marg=
in-left:0px"></p><div><span style=3D"font-size:small">We=E2=80=99re hiring =
if you know of any awesome Java Devops or Linux Operations Engineers!</span=
><br></div><div><br></div><div>Founder/CEO=C2=A0<a href=3D"http://Spinn3r.c=
om" target=3D"_blank">Spinn3r.com</a><br></div><div>Location:=C2=A0<b>San F=
rancisco, CA</b><br></div><div><font color=3D"#2c2c2c" face=3D"Helvetica,Ar=
ial,sans-serif"><span style=3D"line-height:19px">blog:<b>=C2=A0</b></span><=
/font><a href=3D"http://burtonator.wordpress.com" target=3D"_blank">http://=
burtonator.wordpress.com</a></div><div>=E2=80=A6 or check out my <a href=3D=
"https://plus.google.com/102718274791889610666/posts" target=3D"_blank">
Google+ profile</a></div><div><img><br></div><div></div><p></p></div></div>=
</div></div></div></div></div></div></div></font></span></div></blockquote>=
</div><br><br clear=3D"all"><div><br></div>
-- <br><div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr"><d=
iv><div><p style=3D"margin-top:0px;margin-right:0px;margin-bottom:12pt;marg=
in-left:0px"></p><div><span style=3D"font-size:small">We=E2=80=99re hiring =
if you know of any awesome Java Devops or Linux Operations Engineers!</span=
><br></div><div><br></div><div>Founder/CEO=C2=A0<a href=3D"http://Spinn3r.c=
om" target=3D"_blank">Spinn3r.com</a><br></div><div>Location:=C2=A0<b>San F=
rancisco, CA</b><br></div><div><font color=3D"#2c2c2c" face=3D"Helvetica,Ar=
ial,sans-serif"><span style=3D"line-height:19px">blog:<b>=C2=A0</b></span><=
/font><a href=3D"http://burtonator.wordpress.com" target=3D"_blank">http://=
burtonator.wordpress.com</a></div><div>=E2=80=A6 or check out my <a href=3D=
"https://plus.google.com/102718274791889610666/posts" target=3D"_blank">
Google+ profile</a></div><div><img><br></div><div></div><p></p></div></div>=
</div></div></div></div></div></div></div></div></div></div></div></span></=
div></blockquote></div><br><br clear=3D"all"><div><br></div>
-- <br><div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr"><d=
iv><div><p style=3D"margin-top:0px;margin-right:0px;margin-bottom:12pt;marg=
in-left:0px"></p><div><span style=3D"font-size:small">We=E2=80=99re hiring =
if you know of any awesome Java Devops or Linux Operations Engineers!</span=
><br></div><div><br></div><div>Founder/CEO=C2=A0<a href=3D"http://Spinn3r.c=
om" target=3D"_blank">Spinn3r.com</a><br></div><div>Location:=C2=A0<b>San F=
rancisco, CA</b><br></div><div><font color=3D"#2c2c2c" face=3D"Helvetica,Ar=
ial,sans-serif"><span style=3D"line-height:19px">blog:<b>=C2=A0</b></span><=
/font><a href=3D"http://burtonator.wordpress.com" target=3D"_blank">http://=
burtonator.wordpress.com</a></div><div>=E2=80=A6 or check out my <a href=3D=
"https://plus.google.com/102718274791889610666/posts" target=3D"_blank">
Google+ profile</a></div><div><img><br></div><div></div><p></p></div></div>=
</div></div></div></div></div></div></div></div></div></span>
</div></blockquote></div></blockquote></div>

--001a113ec8946e2b8705227715cc--