Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0D648F514 for ; Mon, 1 Apr 2013 15:57:34 +0000 (UTC) Received: (qmail 81650 invoked by uid 500); 1 Apr 2013 15:57:31 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 80612 invoked by uid 500); 1 Apr 2013 15:57:26 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 80579 invoked by uid 99); 1 Apr 2013 15:57:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Apr 2013 15:57:25 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW X-Spam-Check-By: apache.org Received-SPF: unknown (nike.apache.org: error in processing during lookup of kais@neteck-fr.com) Received: from [209.85.214.175] (HELO mail-ob0-f175.google.com) (209.85.214.175) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Apr 2013 15:57:19 +0000 Received: by mail-ob0-f175.google.com with SMTP id va7so1967507obc.6 for ; Mon, 01 Apr 2013 08:56:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type:x-gm-message-state; bh=iSHOsA+AIHW4KLK6azCj1QIrEgmd6SBKfizZkVD/Ctw=; b=SPWrR6xyEJ3XXZKSyhxQtyxpWCtDb7LhGU0688tkotJoladhpffLwA8LFWTGE0A5l4 EEr7gnHrHtfYq5nfGCCjgPZP2Q5m061AqoKaZyqgMs9yLLG6NRx8HKvT7iJh78SC59Ny bQHX1Dh8/JYJt0OHxxdq0wki21pV0YjjNVDbAH+nZaaRfd8eAjaXU42O9nClIzSwhT8n 3620UP0kPYv+yJ+gGz8I+MfPH/LLgC1+J2cfDZY7em9QUG1QKSIzdnwJd4vcuSR1IA63 oNolqb3EukTkbl55VPtGza+LqpQQ7bCFBrZbpXRWUbeK7qDc6m1NX8fxKytyQ+t6h+px s9UA== MIME-Version: 1.0 X-Received: by 10.60.37.229 with SMTP id b5mr4367513oek.21.1364831815800; Mon, 01 Apr 2013 08:56:55 -0700 (PDT) Received: by 10.182.80.71 with HTTP; Mon, 1 Apr 2013 08:56:55 -0700 (PDT) In-Reply-To: References: <831ABE98-8FC9-4F7A-9E4F-6B24222C90B0@thelastpickle.com> Date: Mon, 1 Apr 2013 17:56:55 +0200 Message-ID: Subject: Re: Lost data after expanding cluster c* 1.2.3-1 From: Kais Ahmed To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=089e011849988f2d3c04d94eab7d X-Gm-Message-State: ALoCoQltrokrKXnpu2KKL37nKMyVH+2LR/EPAEMVBDfpUt9SUwCKB7hCbRpaGX8W8w0AlfRVn2Br X-Virus-Checked: Checked by ClamAV on apache.org --089e011849988f2d3c04d94eab7d Content-Type: text/plain; charset=ISO-8859-1 > At this moment the errors started, we see that members and other data are gone, at this moment the nodetool status return (in red color the 3 new nodes) > What errors? The errors was in my side in the application, not cassandra errors > I put for each of them seeds = A ip, and start each with two minutes intervals. > When I'm making changes I tend to change a single node first, confirm everything is OK and then do a bulk change. Thank you for that advice. >I'm not sure what or why it went wrong, but that should get you to a stable place. If you have any problems keep an eye on the logs for errors or warnings. The problem come from that i don't put auto_boostrap to true for the new nodes, not in this documentation ( http://www.datastax.com/docs/1.2/install/expand_ami) >if you are using secondary indexes use nodetool rebuild_index to rebuild those. can i do that at any time, or when the cluster are not loaded Thanks aaron, 2013/4/1 aaron morton > Please do not rely on colour in your emails, the best way to get your > emails accepted by the Apache mail servers is to use plain text. > > > At this moment the errors started, we see that members and other data > are gone, at this moment the nodetool status return (in red color the 3 new > nodes) > What errors? > > > I put for each of them seeds = A ip, and start each with two minutes > intervals. > When I'm making changes I tend to change a single node first, confirm > everything is OK and then do a bulk change. > > > Now the cluster seem to work normally, but i can use the secondary for > the moment, the queryanswer are random > run nodetool repair -pr on each node, let it finish before starting the > next one. > if you are using secondary indexes use nodetool rebuild_index to rebuild > those. > Add one node new node to the cluster and confirm everything is ok, then > add the remaining ones. > > >I'm not sure what or why it went wrong, but that should get you to a > stable place. If you have any problems keep an eye on the logs for errors > or warnings. > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 31/03/2013, at 10:01 PM, Kais Ahmed wrote: > > > Hi aaron, > > > > Thanks for reply, i will try to explain what append exactly > > > > I had 4 C* called [A,B,C,D] cluster (1.2.3-1 version) start with ec2 ami > (https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2) with > > this config --clustername myDSCcluster --totalnodes 4--version community > > > > Two days after this cluster in production, i saw that the cluster was > overload, I wanted to extend it by adding 3 another nodes. > > > > I create a new cluster with 3 C* [D,E,F] ( > https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2) > > > > And follow the documentation ( > http://www.datastax.com/docs/1.2/install/expand_ami) for adding them in > the ring. > > I put for each of them seeds = A ip, and start each with two minutes > intervals. > > > > At this moment the errors started, we see that members and other data > are gone, at this moment the nodetool status return (in red color the 3 new > nodes) > > > > Datacenter: eu-west > > =================== > > Status=Up/Down > > |/ State=Normal/Leaving/Joining/ > >> Moving > >> -- Address Load Tokens Owns Host ID > Rack > >> UN 10.34.142.xxx 10.79 GB 256 15.4% > 4e2e26b8-aa38-428c-a8f5-e86c13eb4442 1b > >> UN 10.32.49.xxx 1.48 MB 256 13.7% > e86f67b6-d7cb-4b47-b090-3824a5887145 1b > >> UN 10.33.206.xxx 2.19 MB 256 11.9% > 92af17c3-954a-4511-bc90-29a9657623e4 1b > >> UN 10.32.27.xxx 1.95 MB 256 14.9% > 862e6b39-b380-40b4-9d61-d83cb8dacf9e 1b > >> UN 10.34.139.xxx 11.67 GB 256 15.5% > 0324e394-b65f-46c8-acb4-1e1f87600a2c 1b > >> UN 10.34.147.xxx 11.18 GB 256 13.9% > cfc09822-5446-4565-a5f0-d25c917e2ce8 1b > >> UN 10.33.193.xxx 10.83 GB 256 14.7% > 59f440db-cd2d-4041-aab4-fc8e9518c954 1b > > > > I saw that the 3 nodes have join the ring but they had no data, i put > the website in maintenance and lauch a nodetool repair on > > the 3 new nodes, during 5 hours i see in opcenter the data streamed to > the new nodes (very nice :)) > > > > During this time, i write a script to check if all members are present > (relative to a copy of members in mysql). > > > > After data streamed seems to be finish, but i'm not sure because > nodetool compactionstats show pending task but nodetool netstats seems to > be ok. > > > > I ran my script to check if the data, but members are still missing. > > > > I decide to roolback by running nodetool decommission node D, E, F > > > > I re run my script, all seems to be ok but secondary index have strange > behavior, > > some time the row was returned some times no result. > > > > the user kais can be retrieve using his key with cassandra-cli but if i > use cqlsh : > > > > cqlsh:database> SELECT login FROM userdata where login='kais' ; > > > > login > > ---------------- > > kais > > > > cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty > > cqlsh:database> SELECT login FROM userdata where login='kais' ; > > > > login > > ---------------- > > kais > > > > cqlsh:database> SELECT login FROM userdata where login='kais' ; > > > > login > > ---------------- > > kais > > > > cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty > > cqlsh:database> SELECT login FROM userdata where login='kais' ; > > > > login > > ---------------- > > kais > > > > cqlsh:mydatabase>Tracing on; > > When tracing is activate i have this error but not all time > > cqlsh:mydatabase> SELECT * FROM userdata where login='kais' ; > > unsupported operand type(s) for /: 'NoneType' and 'float' > > > > > > NOTE : When the cluster contained 7 nodes, i see that my table userdata > (RF 3) on node D was replicated on E and F, that would seem strange because > its 3 node was not correctly filled > > > > Now the cluster seem to work normally, but i can use the secondary for > the moment, the query answer are random > > > > Thanks a lot for any help, > > Kais > > > > > > > > > > > > 2013/3/31 aaron morton > > First thought is the new nodes were marked as seeds. > > Next thought is check the logs for errors. > > > > You can always run a nodetool repair if you are concerned data is not > where you think it should be. > > > > Cheers > > > > > > ----------------- > > Aaron Morton > > Freelance Cassandra Consultant > > New Zealand > > > > @aaronmorton > > http://www.thelastpickle.com > > > > On 29/03/2013, at 8:01 PM, Kais Ahmed wrote: > > > >> Hi all, > >> > >> I follow this tutorial for expanding a 4 c* cluster (production) and > add 3 new nodes. > >> > >> Datacenter: eu-west > >> =================== > >> Status=Up/Down > >> |/ State=Normal/Leaving/Joining/Moving > >> -- Address Load Tokens Owns Host ID > Rack > >> UN 10.34.142.xxx 10.79 GB 256 15.4% > 4e2e26b8-aa38-428c-a8f5-e86c13eb4442 1b > >> UN 10.32.49.xxx 1.48 MB 256 13.7% > e86f67b6-d7cb-4b47-b090-3824a5887145 1b > >> UN 10.33.206.xxx 2.19 MB 256 11.9% > 92af17c3-954a-4511-bc90-29a9657623e4 1b > >> UN 10.32.27.xxx 1.95 MB 256 14.9% > 862e6b39-b380-40b4-9d61-d83cb8dacf9e 1b > >> UN 10.34.139.xxx 11.67 GB 256 15.5% > 0324e394-b65f-46c8-acb4-1e1f87600a2c 1b > >> UN 10.34.147.xxx 11.18 GB 256 13.9% > cfc09822-5446-4565-a5f0-d25c917e2ce8 1b > >> UN 10.33.193.xxx 10.83 GB 256 14.7% > 59f440db-cd2d-4041-aab4-fc8e9518c954 1b > >> > >> The data are not streamed. > >> > >> Can any one help me, our web site is down. > >> > >> Thanks a lot, > >> > >> > > > > > > --089e011849988f2d3c04d94eab7d Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable > At this moment the errors started, we see that=20 members and other data are gone, at this moment the nodetool status=20 return (in red color the 3 new nodes)
> What errors?
The errors wa= s in my side in the application, not cassandra errors

> I put for= each of them seeds =3D A ip, and start each with two minutes intervals. > When I'm making changes I tend to change a single node first, conf= irm everything is OK and then do a bulk change.
Thank you for that advic= e.

>I'm not sure what or why it went wrong, but that should g= et you to a=20 stable place. If you have any problems keep an eye on the logs for=20 errors or warnings.
The problem come from that i don't put=A0 auto_= boostrap to true for the new nodes, not in this documentation (http:= //www.datastax.com/docs/1.2/install/expand_ami)

>if you are using secondary indexes use nodetool rebuild_index to re= build those.
can i do that at any time, or when the cluster are not loaded

Thanks aaron,

2013/4/1 = aaron morton <aaron@thelastpickle.com>
Please do not rely on colour in your emails, the best way to get your email= s accepted by the Apache mail servers is to use plain text.

> At this moment the errors started, we see that members and other data = are gone, at this moment the nodetool status return (in red color the 3 new= nodes)
What errors?

> I put for each of them seeds =3D A ip, and start each with two minutes= intervals.
When I'm making changes I tend to change a single node first, con= firm everything is OK and then do a bulk change.

> Now the cluster seem to work normally, but i can use the secondary for= the moment, the queryanswer are random
run nodetool repair -pr on each node, let it finish before starting t= he next one.
if you are using secondary indexes use nodetool rebuild_index to rebuild th= ose.
Add one node new node to the cluster and confirm everything is ok, then add= the remaining ones.

>I'm not sure what or why it went wrong, but that should get you to = a stable place. If you have any problems keep an eye on the logs for errors= or warnings.

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thela= stpickle.com

On 31/03/2013, at 10:01 PM, K= ais Ahmed <kais@neteck-fr.com&= gt; wrote:

> Hi aaron,
>
> Thanks for reply, i will try to explain what append exactly
>
> I had 4 C* called [A,B,C,D] cluster (1.2.3-1 version) start with ec2 a= mi (https://aws.amazon.com/amis/datastax-auto-clustering-am= i-2-2) with
> this config --clustername myDSCcluster --totalnodes 4--version communi= ty
>
> Two days after this cluster in production, i saw that the cluster was = overload, I wanted to extend it by adding 3 another nodes.
>
> I create a new cluster with 3 C* [D,E,F] =A0(https://a= ws.amazon.com/amis/datastax-auto-clustering-ami-2-2)
>
> And follow the documentation (http://www.datastax.com/docs/1.2/= install/expand_ami) for adding them in the ring.
> I put for each of them seeds =3D A ip, and start each with two minutes= intervals.
>
> At this moment the errors started, we see that members and other data = are gone, at this moment the nodetool status return (in red color the 3 new= nodes)
>
> Datacenter: eu-west
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> Status=3DUp/Down
> |/ State=3DNormal/Leaving/Joining/
>> Moving
>> -- =A0Address =A0 =A0 =A0 =A0 =A0 Load =A0 =A0 =A0 Tokens =A0Owns = =A0 Host ID =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Rac= k
>> UN =A010.34.142.xxx =A0 =A0 10.79 GB =A0 256 =A0 =A0 15.4% =A04e2e= 26b8-aa38-428c-a8f5-e86c13eb4442 =A01b
>> UN =A010.32.49.xxx =A0 =A0 =A0 1.48 MB =A0 =A0256 =A0 =A0 =A0 =A01= 3.7% =A0e86f67b6-d7cb-4b47-b090-3824a5887145 =A01b
>> UN =A010.33.206.xxx =A0 =A0 =A02.19 MB =A0 =A0256 =A0 =A011.9% =A0= 92af17c3-954a-4511-bc90-29a9657623e4 =A01b
>> UN =A010.32.27.xxx =A0 =A0 =A0 1.95 MB =A0 =A0256 =A0 =A0 =A014.9%= =A0862e6b39-b380-40b4-9d61-d83cb8dacf9e =A01b
>> UN =A010.34.139.xxx =A0 =A0 11.67 GB =A0 256 =A0 =A015.5% =A00324e= 394-b65f-46c8-acb4-1e1f87600a2c =A01b
>> UN =A010.34.147.xxx =A0 =A0 11.18 GB =A0 256 =A0 =A0 13.9% =A0cfc0= 9822-5446-4565-a5f0-d25c917e2ce8 =A01b
>> UN =A010.33.193.xxx =A0 =A0 10.83 GB =A0 256 =A0 =A0 =A014.7% =A05= 9f440db-cd2d-4041-aab4-fc8e9518c954 =A01b
>
> I saw that the 3 nodes have join the ring but they had no data, i put = the website in maintenance and lauch a nodetool repair on
> the 3 new nodes, during 5 hours i see in opcenter the data streamed to= the new nodes (very nice :))
>
> During this time, i write a script to check if all members are present= (relative to a copy of members in mysql).
>
> After data streamed seems to be finish, but i'm not sure because n= odetool compactionstats show pending task but nodetool netstats seems to be= ok.
>
> I ran my script to check if the data, but members are still missing. >
> I decide to roolback by running nodetool decommission node D, E, F
>
> I re run my script, all seems to be ok but secondary index have strang= e behavior,
> some time the row was returned some times no result.
>
> the user kais can be retrieve using his key with cassandra-cli but if = i use cqlsh :
>
> cqlsh:database> SELECT login FROM userdata where login=3D'kais&= #39; ;
>
> =A0login
> ----------------
> =A0kais
>
> cqlsh:database> SELECT login FROM userdata where login=3D'kais&= #39; ; //empty
> cqlsh:database> SELECT login FROM userdata where login=3D'kais&= #39; ;
>
> =A0login
> ----------------
> =A0kais
>
> cqlsh:database> SELECT login FROM userdata where login=3D'kais&= #39; ;
>
> =A0login
> ----------------
> =A0kais
>
> cqlsh:database> SELECT login FROM userdata where login=3D'kais&= #39; ; //empty
> cqlsh:database> SELECT login FROM userdata where login=3D'kais&= #39; ;
>
> =A0login
> ----------------
> =A0kais
>
> cqlsh:mydatabase>Tracing on;
> When tracing is activate i have this error but not all time
> cqlsh:mydatabase> SELECT * FROM userdata where login=3D'kais= 9; ;
> unsupported operand type(s) for /: 'NoneType' and 'float&#= 39;
>
>
> NOTE : When the cluster contained 7 nodes, i see that my table userdat= a (RF 3) on node D was replicated on E and F, that would seem strange becau= se its 3 node was not correctly filled
>
> Now the cluster seem to work normally, but i can use the secondary for= the moment, the query answer are random
>
> Thanks a lot for any help,
> Kais
>
>
>
>
>
> 2013/3/31 aaron morton <= aaron@thelastpickle.com>
> First thought is the new nodes were marked as seeds.
> Next thought is check the logs for errors.
>
> You can always run a nodetool repair if you are concerned data is not = where you think it should be.
>
> Cheers
>
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.= thelastpickle.com
>
> On 29/03/2013, at 8:01 PM, Kais Ahmed <kais@neteck-fr.com> wrote:
>
>> Hi all,
>>
>> I follow this tutorial for expanding a 4 c* cluster (production) a= nd add 3 new nodes.
>>
>> Datacenter: eu-west
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> Status=3DUp/Down
>> |/ State=3DNormal/Leaving/Joining/Moving
>> -- =A0Address =A0 =A0 =A0 =A0 =A0 Load =A0 =A0 =A0 Tokens =A0Owns = =A0 Host ID =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Rac= k
>> UN =A010.34.142.xxx =A0 =A0 10.79 GB =A0 256 =A0 =A0 15.4% =A04e2e= 26b8-aa38-428c-a8f5-e86c13eb4442 =A01b
>> UN =A010.32.49.xxx =A0 =A0 =A0 1.48 MB =A0 =A0256 =A0 =A0 =A0 =A01= 3.7% =A0e86f67b6-d7cb-4b47-b090-3824a5887145 =A01b
>> UN =A010.33.206.xxx =A0 =A0 =A02.19 MB =A0 =A0256 =A0 =A011.9% =A0= 92af17c3-954a-4511-bc90-29a9657623e4 =A01b
>> UN =A010.32.27.xxx =A0 =A0 =A0 1.95 MB =A0 =A0256 =A0 =A0 =A014.9%= =A0862e6b39-b380-40b4-9d61-d83cb8dacf9e =A01b
>> UN =A010.34.139.xxx =A0 =A0 11.67 GB =A0 256 =A0 =A015.5% =A00324e= 394-b65f-46c8-acb4-1e1f87600a2c =A01b
>> UN =A010.34.147.xxx =A0 =A0 11.18 GB =A0 256 =A0 =A0 13.9% =A0cfc0= 9822-5446-4565-a5f0-d25c917e2ce8 =A01b
>> UN =A010.33.193.xxx =A0 =A0 10.83 GB =A0 256 =A0 =A0 =A014.7% =A05= 9f440db-cd2d-4041-aab4-fc8e9518c954 =A01b
>>
>> The data are not streamed.
>>
>> Can any one help me, our web site is down.
>>
>> Thanks a lot,
>>
>>
>
>


--089e011849988f2d3c04d94eab7d--