Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: unknown (nike.apache.org: error in processing during lookup of
 kais@neteck-fr.com)
MIME-Version: 1.0
In-Reply-To: <FF2A5D64-D380-46E4-9FCB-854CB29C7BFA@thelastpickle.com>
References: 
 <CAF0CjOoZs6i55CquOhL4QsYe7NaPtj6hCYMkCMBACp2=aJ2rqw@mail.gmail.com>
	<831ABE98-8FC9-4F7A-9E4F-6B24222C90B0@thelastpickle.com>
	<CAF0CjOq6VUD-pcv2_HTuk6oDbdoOty0OM__84aT4BOQhbygZrw@mail.gmail.com>
	<FF2A5D64-D380-46E4-9FCB-854CB29C7BFA@thelastpickle.com>
Date: Mon, 1 Apr 2013 17:56:55 +0200
Message-ID: 
 <CAF0CjOrFcwiexcBR6z7OcHhppN7XT+sLk6jFU2gFxLGHsRW7yg@mail.gmail.com>
Subject: Re: Lost data after expanding cluster c* 1.2.3-1
From: Kais Ahmed <kais@neteck-fr.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=089e011849988f2d3c04d94eab7d

--089e011849988f2d3c04d94eab7d
Content-Type: text/plain; charset=ISO-8859-1

> At this moment the errors started, we see that members and other data are
gone, at this moment the nodetool status return (in red color the 3 new
nodes)
> What errors?
The errors was in my side in the application, not cassandra errors

> I put for each of them seeds = A ip, and start each with two minutes
intervals.
> When I'm making changes I tend to change a single node first, confirm
everything is OK and then do a bulk change.
Thank you for that advice.

>I'm not sure what or why it went wrong, but that should get you to a
stable place. If you have any problems keep an eye on the logs for errors
or warnings.
The problem come from that i don't put  auto_boostrap to true for the new
nodes, not in this documentation (
http://www.datastax.com/docs/1.2/install/expand_ami)

>if you are using secondary indexes use nodetool rebuild_index to rebuild
those.
can i do that at any time, or when the cluster are not loaded

Thanks aaron,

2013/4/1 aaron morton <aaron@thelastpickle.com>

> Please do not rely on colour in your emails, the best way to get your
> emails accepted by the Apache mail servers is to use plain text.
>
> > At this moment the errors started, we see that members and other data
> are gone, at this moment the nodetool status return (in red color the 3 new
> nodes)
> What errors?
>
> > I put for each of them seeds = A ip, and start each with two minutes
> intervals.
> When I'm making changes I tend to change a single node first, confirm
> everything is OK and then do a bulk change.
>
> > Now the cluster seem to work normally, but i can use the secondary for
> the moment, the queryanswer are random
> run nodetool repair -pr on each node, let it finish before starting the
> next one.
> if you are using secondary indexes use nodetool rebuild_index to rebuild
> those.
> Add one node new node to the cluster and confirm everything is ok, then
> add the remaining ones.
>
> >I'm not sure what or why it went wrong, but that should get you to a
> stable place. If you have any problems keep an eye on the logs for errors
> or warnings.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 31/03/2013, at 10:01 PM, Kais Ahmed <kais@neteck-fr.com> wrote:
>
> > Hi aaron,
> >
> > Thanks for reply, i will try to explain what append exactly
> >
> > I had 4 C* called [A,B,C,D] cluster (1.2.3-1 version) start with ec2 ami
> (https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2) with
> > this config --clustername myDSCcluster --totalnodes 4--version community
> >
> > Two days after this cluster in production, i saw that the cluster was
> overload, I wanted to extend it by adding 3 another nodes.
> >
> > I create a new cluster with 3 C* [D,E,F]  (
> https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2)
> >
> > And follow the documentation (
> http://www.datastax.com/docs/1.2/install/expand_ami) for adding them in
> the ring.
> > I put for each of them seeds = A ip, and start each with two minutes
> intervals.
> >
> > At this moment the errors started, we see that members and other data
> are gone, at this moment the nodetool status return (in red color the 3 new
> nodes)
> >
> > Datacenter: eu-west
> > ===================
> > Status=Up/Down
> > |/ State=Normal/Leaving/Joining/
> >> Moving
> >> --  Address           Load       Tokens  Owns   Host ID
>               Rack
> >> UN  10.34.142.xxx     10.79 GB   256     15.4%
>  4e2e26b8-aa38-428c-a8f5-e86c13eb4442  1b
> >> UN  10.32.49.xxx       1.48 MB    256        13.7%
>  e86f67b6-d7cb-4b47-b090-3824a5887145  1b
> >> UN  10.33.206.xxx      2.19 MB    256    11.9%
>  92af17c3-954a-4511-bc90-29a9657623e4  1b
> >> UN  10.32.27.xxx       1.95 MB    256      14.9%
>  862e6b39-b380-40b4-9d61-d83cb8dacf9e  1b
> >> UN  10.34.139.xxx     11.67 GB   256    15.5%
>  0324e394-b65f-46c8-acb4-1e1f87600a2c  1b
> >> UN  10.34.147.xxx     11.18 GB   256     13.9%
>  cfc09822-5446-4565-a5f0-d25c917e2ce8  1b
> >> UN  10.33.193.xxx     10.83 GB   256      14.7%
>  59f440db-cd2d-4041-aab4-fc8e9518c954  1b
> >
> > I saw that the 3 nodes have join the ring but they had no data, i put
> the website in maintenance and lauch a nodetool repair on
> > the 3 new nodes, during 5 hours i see in opcenter the data streamed to
> the new nodes (very nice :))
> >
> > During this time, i write a script to check if all members are present
> (relative to a copy of members in mysql).
> >
> > After data streamed seems to be finish, but i'm not sure because
> nodetool compactionstats show pending task but nodetool netstats seems to
> be ok.
> >
> > I ran my script to check if the data, but members are still missing.
> >
> > I decide to roolback by running nodetool decommission node D, E, F
> >
> > I re run my script, all seems to be ok but secondary index have strange
> behavior,
> > some time the row was returned some times no result.
> >
> > the user kais can be retrieve using his key with cassandra-cli but if i
> use cqlsh :
> >
> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
> >
> >  login
> > ----------------
> >  kais
> >
> > cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty
> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
> >
> >  login
> > ----------------
> >  kais
> >
> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
> >
> >  login
> > ----------------
> >  kais
> >
> > cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty
> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
> >
> >  login
> > ----------------
> >  kais
> >
> > cqlsh:mydatabase>Tracing on;
> > When tracing is activate i have this error but not all time
> > cqlsh:mydatabase> SELECT * FROM userdata where login='kais' ;
> > unsupported operand type(s) for /: 'NoneType' and 'float'
> >
> >
> > NOTE : When the cluster contained 7 nodes, i see that my table userdata
> (RF 3) on node D was replicated on E and F, that would seem strange because
> its 3 node was not correctly filled
> >
> > Now the cluster seem to work normally, but i can use the secondary for
> the moment, the query answer are random
> >
> > Thanks a lot for any help,
> > Kais
> >
> >
> >
> >
> >
> > 2013/3/31 aaron morton <aaron@thelastpickle.com>
> > First thought is the new nodes were marked as seeds.
> > Next thought is check the logs for errors.
> >
> > You can always run a nodetool repair if you are concerned data is not
> where you think it should be.
> >
> > Cheers
> >
> >
> > -----------------
> > Aaron Morton
> > Freelance Cassandra Consultant
> > New Zealand
> >
> > @aaronmorton
> > http://www.thelastpickle.com
> >
> > On 29/03/2013, at 8:01 PM, Kais Ahmed <kais@neteck-fr.com> wrote:
> >
> >> Hi all,
> >>
> >> I follow this tutorial for expanding a 4 c* cluster (production) and
> add 3 new nodes.
> >>
> >> Datacenter: eu-west
> >> ===================
> >> Status=Up/Down
> >> |/ State=Normal/Leaving/Joining/Moving
> >> --  Address           Load       Tokens  Owns   Host ID
>               Rack
> >> UN  10.34.142.xxx     10.79 GB   256     15.4%
>  4e2e26b8-aa38-428c-a8f5-e86c13eb4442  1b
> >> UN  10.32.49.xxx       1.48 MB    256        13.7%
>  e86f67b6-d7cb-4b47-b090-3824a5887145  1b
> >> UN  10.33.206.xxx      2.19 MB    256    11.9%
>  92af17c3-954a-4511-bc90-29a9657623e4  1b
> >> UN  10.32.27.xxx       1.95 MB    256      14.9%
>  862e6b39-b380-40b4-9d61-d83cb8dacf9e  1b
> >> UN  10.34.139.xxx     11.67 GB   256    15.5%
>  0324e394-b65f-46c8-acb4-1e1f87600a2c  1b
> >> UN  10.34.147.xxx     11.18 GB   256     13.9%
>  cfc09822-5446-4565-a5f0-d25c917e2ce8  1b
> >> UN  10.33.193.xxx     10.83 GB   256      14.7%
>  59f440db-cd2d-4041-aab4-fc8e9518c954  1b
> >>
> >> The data are not streamed.
> >>
> >> Can any one help me, our web site is down.
> >>
> >> Thanks a lot,
> >>
> >>
> >
> >
>
>

--089e011849988f2d3c04d94eab7d
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

&gt; At this moment the errors started, we see that=20
members and other data are gone, at this moment the nodetool status=20
return (in red color the 3 new nodes)<br>&gt; What errors?<br>The errors wa=
s in my side in the application, not cassandra errors<br><br>&gt; I put for=
 each of them seeds =3D A ip, and start each with two minutes intervals.<br=
>
&gt; When I&#39;m making changes I tend to change a single node first, conf=
irm everything is OK and then do a bulk change.<br>Thank you for that advic=
e.<br><br>&gt;I&#39;m not sure what or why it went wrong, but that should g=
et you to a=20
stable place. If you have any problems keep an eye on the logs for=20
errors or warnings.<br> The problem come from that i don&#39;t put=A0 auto_=
boostrap to true for the new nodes, not in this documentation (<a href=3D"h=
ttp://www.datastax.com/docs/1.2/install/expand_ami" target=3D"_blank">http:=
//www.datastax.com/docs/1.2/install/expand_ami</a>)<br>
<br>&gt;if you are using secondary indexes use nodetool rebuild_index to re=
build those.<br><span id=3D"result_box" class=3D"" lang=3D"en"><span class=
=3D"hps">can i do</span> <span class=3D"hps">that at any time</span><span c=
lass=3D"">, or when the</span> <span class=3D"hps">cluster</span> <span cla=
ss=3D"hps">are not</span> <span class=3D"hps">loaded<br>
<br></span></span>Thanks aaron,<br><br><div class=3D"gmail_quote">2013/4/1 =
aaron morton <span dir=3D"ltr">&lt;<a href=3D"mailto:aaron@thelastpickle.co=
m" target=3D"_blank">aaron@thelastpickle.com</a>&gt;</span><br><blockquote =
class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid=
;padding-left:1ex">
Please do not rely on colour in your emails, the best way to get your email=
s accepted by the Apache mail servers is to use plain text.<br>
<div class=3D"im"><br>
&gt; At this moment the errors started, we see that members and other data =
are gone, at this moment the nodetool status return (in red color the 3 new=
 nodes)<br>
</div>What errors?<br>
<div class=3D"im"><br>
&gt; I put for each of them seeds =3D A ip, and start each with two minutes=
 intervals.<br>
</div>When I&#39;m making changes I tend to change a single node first, con=
firm everything is OK and then do a bulk change.<br>
<div class=3D"im"><br>
&gt; Now the cluster seem to work normally, but i can use the secondary for=
 the moment, the queryanswer are random<br>
</div>run nodetool repair -pr on each node, let it finish before starting t=
he next one.<br>
if you are using secondary indexes use nodetool rebuild_index to rebuild th=
ose.<br>
Add one node new node to the cluster and confirm everything is ok, then add=
 the remaining ones.<br>
<br>
&gt;I&#39;m not sure what or why it went wrong, but that should get you to =
a stable place. If you have any problems keep an eye on the logs for errors=
 or warnings.<br>
<div class=3D"im HOEnZb"><br>
Cheers<br>
<br>
-----------------<br>
Aaron Morton<br>
Freelance Cassandra Consultant<br>
New Zealand<br>
<br>
@aaronmorton<br>
<a href=3D"http://www.thelastpickle.com" target=3D"_blank">http://www.thela=
stpickle.com</a><br>
<br>
</div><div class=3D"HOEnZb"><div class=3D"h5">On 31/03/2013, at 10:01 PM, K=
ais Ahmed &lt;<a href=3D"mailto:kais@neteck-fr.com">kais@neteck-fr.com</a>&=
gt; wrote:<br>
<br>
&gt; Hi aaron,<br>
&gt;<br>
&gt; Thanks for reply, i will try to explain what append exactly<br>
&gt;<br>
&gt; I had 4 C* called [A,B,C,D] cluster (1.2.3-1 version) start with ec2 a=
mi (<a href=3D"https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2=
" target=3D"_blank">https://aws.amazon.com/amis/datastax-auto-clustering-am=
i-2-2</a>) with<br>

&gt; this config --clustername myDSCcluster --totalnodes 4--version communi=
ty<br>
&gt;<br>
&gt; Two days after this cluster in production, i saw that the cluster was =
overload, I wanted to extend it by adding 3 another nodes.<br>
&gt;<br>
&gt; I create a new cluster with 3 C* [D,E,F] =A0(<a href=3D"https://aws.am=
azon.com/amis/datastax-auto-clustering-ami-2-2" target=3D"_blank">https://a=
ws.amazon.com/amis/datastax-auto-clustering-ami-2-2</a>)<br>
&gt;<br>
&gt; And follow the documentation (<a href=3D"http://www.datastax.com/docs/=
1.2/install/expand_ami" target=3D"_blank">http://www.datastax.com/docs/1.2/=
install/expand_ami</a>) for adding them in the ring.<br>
&gt; I put for each of them seeds =3D A ip, and start each with two minutes=
 intervals.<br>
&gt;<br>
&gt; At this moment the errors started, we see that members and other data =
are gone, at this moment the nodetool status return (in red color the 3 new=
 nodes)<br>
&gt;<br>
&gt; Datacenter: eu-west<br>
&gt; =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D<br>
&gt; Status=3DUp/Down<br>
&gt; |/ State=3DNormal/Leaving/Joining/<br>
&gt;&gt; Moving<br>
&gt;&gt; -- =A0Address =A0 =A0 =A0 =A0 =A0 Load =A0 =A0 =A0 Tokens =A0Owns =
=A0 Host ID =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Rac=
k<br>
&gt;&gt; UN =A010.34.142.xxx =A0 =A0 10.79 GB =A0 256 =A0 =A0 15.4% =A04e2e=
26b8-aa38-428c-a8f5-e86c13eb4442 =A01b<br>
&gt;&gt; UN =A010.32.49.xxx =A0 =A0 =A0 1.48 MB =A0 =A0256 =A0 =A0 =A0 =A01=
3.7% =A0e86f67b6-d7cb-4b47-b090-3824a5887145 =A01b<br>
&gt;&gt; UN =A010.33.206.xxx =A0 =A0 =A02.19 MB =A0 =A0256 =A0 =A011.9% =A0=
92af17c3-954a-4511-bc90-29a9657623e4 =A01b<br>
&gt;&gt; UN =A010.32.27.xxx =A0 =A0 =A0 1.95 MB =A0 =A0256 =A0 =A0 =A014.9%=
 =A0862e6b39-b380-40b4-9d61-d83cb8dacf9e =A01b<br>
&gt;&gt; UN =A010.34.139.xxx =A0 =A0 11.67 GB =A0 256 =A0 =A015.5% =A00324e=
394-b65f-46c8-acb4-1e1f87600a2c =A01b<br>
&gt;&gt; UN =A010.34.147.xxx =A0 =A0 11.18 GB =A0 256 =A0 =A0 13.9% =A0cfc0=
9822-5446-4565-a5f0-d25c917e2ce8 =A01b<br>
&gt;&gt; UN =A010.33.193.xxx =A0 =A0 10.83 GB =A0 256 =A0 =A0 =A014.7% =A05=
9f440db-cd2d-4041-aab4-fc8e9518c954 =A01b<br>
&gt;<br>
&gt; I saw that the 3 nodes have join the ring but they had no data, i put =
the website in maintenance and lauch a nodetool repair on<br>
&gt; the 3 new nodes, during 5 hours i see in opcenter the data streamed to=
 the new nodes (very nice :))<br>
&gt;<br>
&gt; During this time, i write a script to check if all members are present=
 (relative to a copy of members in mysql).<br>
&gt;<br>
&gt; After data streamed seems to be finish, but i&#39;m not sure because n=
odetool compactionstats show pending task but nodetool netstats seems to be=
 ok.<br>
&gt;<br>
&gt; I ran my script to check if the data, but members are still missing.<b=
r>
&gt;<br>
&gt; I decide to roolback by running nodetool decommission node D, E, F<br>
&gt;<br>
&gt; I re run my script, all seems to be ok but secondary index have strang=
e behavior,<br>
&gt; some time the row was returned some times no result.<br>
&gt;<br>
&gt; the user kais can be retrieve using his key with cassandra-cli but if =
i use cqlsh :<br>
&gt;<br>
&gt; cqlsh:database&gt; SELECT login FROM userdata where login=3D&#39;kais&=
#39; ;<br>
&gt;<br>
&gt; =A0login<br>
&gt; ----------------<br>
&gt; =A0kais<br>
&gt;<br>
&gt; cqlsh:database&gt; SELECT login FROM userdata where login=3D&#39;kais&=
#39; ; //empty<br>
&gt; cqlsh:database&gt; SELECT login FROM userdata where login=3D&#39;kais&=
#39; ;<br>
&gt;<br>
&gt; =A0login<br>
&gt; ----------------<br>
&gt; =A0kais<br>
&gt;<br>
&gt; cqlsh:database&gt; SELECT login FROM userdata where login=3D&#39;kais&=
#39; ;<br>
&gt;<br>
&gt; =A0login<br>
&gt; ----------------<br>
&gt; =A0kais<br>
&gt;<br>
&gt; cqlsh:database&gt; SELECT login FROM userdata where login=3D&#39;kais&=
#39; ; //empty<br>
&gt; cqlsh:database&gt; SELECT login FROM userdata where login=3D&#39;kais&=
#39; ;<br>
&gt;<br>
&gt; =A0login<br>
&gt; ----------------<br>
&gt; =A0kais<br>
&gt;<br>
&gt; cqlsh:mydatabase&gt;Tracing on;<br>
&gt; When tracing is activate i have this error but not all time<br>
&gt; cqlsh:mydatabase&gt; SELECT * FROM userdata where login=3D&#39;kais=
9; ;<br>
&gt; unsupported operand type(s) for /: &#39;NoneType&#39; and &#39;float&#=
39;<br>
&gt;<br>
&gt;<br>
&gt; NOTE : When the cluster contained 7 nodes, i see that my table userdat=
a (RF 3) on node D was replicated on E and F, that would seem strange becau=
se its 3 node was not correctly filled<br>
&gt;<br>
&gt; Now the cluster seem to work normally, but i can use the secondary for=
 the moment, the query answer are random<br>
&gt;<br>
&gt; Thanks a lot for any help,<br>
&gt; Kais<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; 2013/3/31 aaron morton &lt;<a href=3D"mailto:aaron@thelastpickle.com">=
aaron@thelastpickle.com</a>&gt;<br>
&gt; First thought is the new nodes were marked as seeds.<br>
&gt; Next thought is check the logs for errors.<br>
&gt;<br>
&gt; You can always run a nodetool repair if you are concerned data is not =
where you think it should be.<br>
&gt;<br>
&gt; Cheers<br>
&gt;<br>
&gt;<br>
&gt; -----------------<br>
&gt; Aaron Morton<br>
&gt; Freelance Cassandra Consultant<br>
&gt; New Zealand<br>
&gt;<br>
&gt; @aaronmorton<br>
&gt; <a href=3D"http://www.thelastpickle.com" target=3D"_blank">http://www.=
thelastpickle.com</a><br>
&gt;<br>
&gt; On 29/03/2013, at 8:01 PM, Kais Ahmed &lt;<a href=3D"mailto:kais@netec=
k-fr.com">kais@neteck-fr.com</a>&gt; wrote:<br>
&gt;<br>
&gt;&gt; Hi all,<br>
&gt;&gt;<br>
&gt;&gt; I follow this tutorial for expanding a 4 c* cluster (production) a=
nd add 3 new nodes.<br>
&gt;&gt;<br>
&gt;&gt; Datacenter: eu-west<br>
&gt;&gt; =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D<br>
&gt;&gt; Status=3DUp/Down<br>
&gt;&gt; |/ State=3DNormal/Leaving/Joining/Moving<br>
&gt;&gt; -- =A0Address =A0 =A0 =A0 =A0 =A0 Load =A0 =A0 =A0 Tokens =A0Owns =
=A0 Host ID =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Rac=
k<br>
&gt;&gt; UN =A010.34.142.xxx =A0 =A0 10.79 GB =A0 256 =A0 =A0 15.4% =A04e2e=
26b8-aa38-428c-a8f5-e86c13eb4442 =A01b<br>
&gt;&gt; UN =A010.32.49.xxx =A0 =A0 =A0 1.48 MB =A0 =A0256 =A0 =A0 =A0 =A01=
3.7% =A0e86f67b6-d7cb-4b47-b090-3824a5887145 =A01b<br>
&gt;&gt; UN =A010.33.206.xxx =A0 =A0 =A02.19 MB =A0 =A0256 =A0 =A011.9% =A0=
92af17c3-954a-4511-bc90-29a9657623e4 =A01b<br>
&gt;&gt; UN =A010.32.27.xxx =A0 =A0 =A0 1.95 MB =A0 =A0256 =A0 =A0 =A014.9%=
 =A0862e6b39-b380-40b4-9d61-d83cb8dacf9e =A01b<br>
&gt;&gt; UN =A010.34.139.xxx =A0 =A0 11.67 GB =A0 256 =A0 =A015.5% =A00324e=
394-b65f-46c8-acb4-1e1f87600a2c =A01b<br>
&gt;&gt; UN =A010.34.147.xxx =A0 =A0 11.18 GB =A0 256 =A0 =A0 13.9% =A0cfc0=
9822-5446-4565-a5f0-d25c917e2ce8 =A01b<br>
&gt;&gt; UN =A010.33.193.xxx =A0 =A0 10.83 GB =A0 256 =A0 =A0 =A014.7% =A05=
9f440db-cd2d-4041-aab4-fc8e9518c954 =A01b<br>
&gt;&gt;<br>
&gt;&gt; The data are not streamed.<br>
&gt;&gt;<br>
&gt;&gt; Can any one help me, our web site is down.<br>
&gt;&gt;<br>
&gt;&gt; Thanks a lot,<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;<br>
&gt;<br>
<br>
</div></div></blockquote></div><br>

--089e011849988f2d3c04d94eab7d--