incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Lost data after expanding cluster c* 1.2.3-1
Date Sun, 31 Mar 2013 23:58:52 GMT
Please do not rely on colour in your emails, the best way to get your emails accepted by the
Apache mail servers is to use plain text. 

> At this moment the errors started, we see that members and other data are gone, at this
moment the nodetool status return (in red color the 3 new nodes)
What errors?

> I put for each of them seeds = A ip, and start each with two minutes intervals. 
When I'm making changes I tend to change a single node first, confirm everything is OK and
then do a bulk change.

> Now the cluster seem to work normally, but i can use the secondary for the moment, the
queryanswer are random
run nodetool repair -pr on each node, let it finish before starting the next one. 
if you are using secondary indexes use nodetool rebuild_index to rebuild those. 
Add one node new node to the cluster and confirm everything is ok, then add the remaining
ones. 

I'm not sure what or why it went wrong, but that should get you to a stable place. If you
have any problems keep an eye on the logs for errors or warnings. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 31/03/2013, at 10:01 PM, Kais Ahmed <kais@neteck-fr.com> wrote:

> Hi aaron,
> 
> Thanks for reply, i will try to explain what append exactly
> 
> I had 4 C* called [A,B,C,D] cluster (1.2.3-1 version) start with ec2 ami (https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2)
with 
> this config --clustername myDSCcluster --totalnodes 4--version community
> 
> Two days after this cluster in production, i saw that the cluster was overload, I wanted
to extend it by adding 3 another nodes.
> 
> I create a new cluster with 3 C* [D,E,F]  (https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2)
> 
> And follow the documentation (http://www.datastax.com/docs/1.2/install/expand_ami) for
adding them in the ring.
> I put for each of them seeds = A ip, and start each with two minutes intervals. 
> 
> At this moment the errors started, we see that members and other data are gone, at this
moment the nodetool status return (in red color the 3 new nodes)
> 
> Datacenter: eu-west
> ===================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/
>> Moving
>> --  Address           Load       Tokens  Owns   Host ID                         
     Rack
>> UN  10.34.142.xxx     10.79 GB   256     15.4%  4e2e26b8-aa38-428c-a8f5-e86c13eb4442
 1b
>> UN  10.32.49.xxx       1.48 MB    256        13.7%  e86f67b6-d7cb-4b47-b090-3824a5887145
 1b
>> UN  10.33.206.xxx      2.19 MB    256    11.9%  92af17c3-954a-4511-bc90-29a9657623e4
 1b
>> UN  10.32.27.xxx       1.95 MB    256      14.9%  862e6b39-b380-40b4-9d61-d83cb8dacf9e
 1b
>> UN  10.34.139.xxx     11.67 GB   256    15.5%  0324e394-b65f-46c8-acb4-1e1f87600a2c
 1b
>> UN  10.34.147.xxx     11.18 GB   256     13.9%  cfc09822-5446-4565-a5f0-d25c917e2ce8
 1b
>> UN  10.33.193.xxx     10.83 GB   256      14.7%  59f440db-cd2d-4041-aab4-fc8e9518c954
 1b
> 
> I saw that the 3 nodes have join the ring but they had no data, i put the website in
maintenance and lauch a nodetool repair on
> the 3 new nodes, during 5 hours i see in opcenter the data streamed to the new nodes
(very nice :))
> 
> During this time, i write a script to check if all members are present (relative to a
copy of members in mysql).
> 
> After data streamed seems to be finish, but i'm not sure because nodetool compactionstats
show pending task but nodetool netstats seems to be ok.
> 
> I ran my script to check if the data, but members are still missing.
> 
> I decide to roolback by running nodetool decommission node D, E, F
> 
> I re run my script, all seems to be ok but secondary index have strange behavior, 
> some time the row was returned some times no result.
> 
> the user kais can be retrieve using his key with cassandra-cli but if i use cqlsh :
> 
> cqlsh:database> SELECT login FROM userdata where login='kais' ;
> 
>  login
> ----------------
>  kais
> 
> cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty
> cqlsh:database> SELECT login FROM userdata where login='kais' ;
> 
>  login
> ----------------
>  kais
> 
> cqlsh:database> SELECT login FROM userdata where login='kais' ;
> 
>  login
> ----------------
>  kais
> 
> cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty
> cqlsh:database> SELECT login FROM userdata where login='kais' ;
> 
>  login
> ----------------
>  kais
> 
> cqlsh:mydatabase>Tracing on;
> When tracing is activate i have this error but not all time
> cqlsh:mydatabase> SELECT * FROM userdata where login='kais' ;
> unsupported operand type(s) for /: 'NoneType' and 'float'
> 
> 
> NOTE : When the cluster contained 7 nodes, i see that my table userdata (RF 3) on node
D was replicated on E and F, that would seem strange because its 3 node was not correctly
filled
> 
> Now the cluster seem to work normally, but i can use the secondary for the moment, the
query answer are random
> 
> Thanks a lot for any help,
> Kais
> 
> 
> 
> 
> 
> 2013/3/31 aaron morton <aaron@thelastpickle.com>
> First thought is the new nodes were marked as seeds. 
> Next thought is check the logs for errors. 
> 
> You can always run a nodetool repair if you are concerned data is not where you think
it should be. 
> 
> Cheers
> 
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 29/03/2013, at 8:01 PM, Kais Ahmed <kais@neteck-fr.com> wrote:
> 
>> Hi all,
>> 
>> I follow this tutorial for expanding a 4 c* cluster (production) and add 3 new nodes.
>> 
>> Datacenter: eu-west
>> ===================
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address           Load       Tokens  Owns   Host ID                         
     Rack
>> UN  10.34.142.xxx     10.79 GB   256     15.4%  4e2e26b8-aa38-428c-a8f5-e86c13eb4442
 1b
>> UN  10.32.49.xxx       1.48 MB    256        13.7%  e86f67b6-d7cb-4b47-b090-3824a5887145
 1b
>> UN  10.33.206.xxx      2.19 MB    256    11.9%  92af17c3-954a-4511-bc90-29a9657623e4
 1b
>> UN  10.32.27.xxx       1.95 MB    256      14.9%  862e6b39-b380-40b4-9d61-d83cb8dacf9e
 1b
>> UN  10.34.139.xxx     11.67 GB   256    15.5%  0324e394-b65f-46c8-acb4-1e1f87600a2c
 1b
>> UN  10.34.147.xxx     11.18 GB   256     13.9%  cfc09822-5446-4565-a5f0-d25c917e2ce8
 1b
>> UN  10.33.193.xxx     10.83 GB   256      14.7%  59f440db-cd2d-4041-aab4-fc8e9518c954
 1b
>> 
>> The data are not streamed.
>> 
>> Can any one help me, our web site is down.
>> 
>> Thanks a lot,
>> 
>> 
> 
> 


Mime
View raw message