incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kais Ahmed <k...@neteck-fr.com>
Subject Re: Lost data after expanding cluster c* 1.2.3-1
Date Sun, 31 Mar 2013 16:31:13 GMT
Hi aaron,

Thanks for reply, i will try to explain what append exactly

I had 4 C* called [A,B,C,D] cluster (1.2.3-1 version) start with ec2 ami (
https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2) with
this config --clustername myDSCcluster --totalnodes 4--version community

Two days after this cluster in production, i saw that the cluster was
overload, I wanted to extend it by adding 3 another nodes.

I create a new cluster with 3 C* [D,E,F]  (
https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2)

And follow the documentation (
http://www.datastax.com/docs/1.2/install/expand_ami) for adding them in the
ring.
I put for each of them seeds = A ip, and start each with two minutes
intervals.

At this moment the errors started, we see that members and other data are
gone, at this moment the nodetool status return (in red color the 3 new
nodes)

Datacenter: eu-west
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/

Moving
--  Address           Load       Tokens  Owns   Host
ID                               Rack
UN  10.34.142.xxx     10.79 GB   256     15.4%
4e2e26b8-aa38-428c-a8f5-e86c13eb4442  1b
UN  10.32.49.xxx       1.48 MB    256        13.7%
e86f67b6-d7cb-4b47-b090-3824a5887145
1b
UN  10.33.206.xxx      2.19 MB    256    11.9%
92af17c3-954a-4511-bc90-29a9657623e4
1b
UN  10.32.27.xxx       1.95 MB    256      14.9%
862e6b39-b380-40b4-9d61-d83cb8dacf9e
1b
UN  10.34.139.xxx     11.67 GB   256    15.5%
0324e394-b65f-46c8-acb4-1e1f87600a2c  1b
UN  10.34.147.xxx     11.18 GB   256     13.9%
cfc09822-5446-4565-a5f0-d25c917e2ce8  1b
UN  10.33.193.xxx     10.83 GB   256      14.7%
59f440db-cd2d-4041-aab4-fc8e9518c954  1b


I saw that the 3 nodes have join the ring but they had no data, i put the
website in maintenance and lauch a nodetool repair on
the 3 new nodes, during 5 hours i see in opcenter the data streamed to the
new nodes (very nice :))

During this time, i write a script to check if all members are present
(relative
to a copy of members in mysql).

After data streamed seems to be finish, but i'm not sure because nodetool
compactionstats show pending task but nodetool netstats seems to be ok.

I ran my script to check if the data, but members are still missing.

I decide to roolback by running nodetool decommission node D, E, F

I re run my script, all seems to be ok but secondary index have strange
behavior,
some time the row was returned some times no result.

the user kais can be retrieve using his key with cassandra-cli but if i use
cqlsh :

cqlsh:database> SELECT login FROM userdata where login='kais' ;

 login
----------------
 kais

cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty
cqlsh:database> SELECT login FROM userdata where login='kais' ;

 login
----------------
 kais

cqlsh:database> SELECT login FROM userdata where login='kais' ;

 login
----------------
 kais

cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty
cqlsh:database> SELECT login FROM userdata where login='kais' ;

 login
----------------
 kais

cqlsh:mydatabase>Tracing on;
When tracing is activate i have this error but not all time
cqlsh:mydatabase> SELECT * FROM userdata where login='kais' ;
unsupported operand type(s) for /: 'NoneType' and 'float'


NOTE : When the cluster contained 7 nodes, i see that my table userdata (RF
3) on node D was replicated on E and F, that would seem strange because its
3 node was not correctly filled

Now the cluster seem to work normally, but i can use the secondary for the
moment, the query answer are random

Thanks a lot for any help,
Kais





2013/3/31 aaron morton <aaron@thelastpickle.com>

> First thought is the new nodes were marked as seeds.
> Next thought is check the logs for errors.
>
> You can always run a nodetool repair if you are concerned data is not
> where you think it should be.
>
> Cheers
>
>
>    -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 29/03/2013, at 8:01 PM, Kais Ahmed <kais@neteck-fr.com> wrote:
>
> Hi all,
>
> I follow this tutorial for expanding a 4 c* cluster (production) and add 3
> new nodes.
>
> Datacenter: eu-west
> ===================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address           Load       Tokens  Owns   Host
> ID                               Rack
> UN  10.34.142.xxx     10.79 GB   256     15.4%
> 4e2e26b8-aa38-428c-a8f5-e86c13eb4442  1b
> UN  10.32.49.xxx       1.48 MB    256        13.7%
> e86f67b6-d7cb-4b47-b090-3824a5887145  1b
> UN  10.33.206.xxx      2.19 MB    256    11.9%
> 92af17c3-954a-4511-bc90-29a9657623e4  1b
> UN  10.32.27.xxx       1.95 MB    256      14.9%
> 862e6b39-b380-40b4-9d61-d83cb8dacf9e  1b
> UN  10.34.139.xxx     11.67 GB   256    15.5%
> 0324e394-b65f-46c8-acb4-1e1f87600a2c  1b
> UN  10.34.147.xxx     11.18 GB   256     13.9%
> cfc09822-5446-4565-a5f0-d25c917e2ce8  1b
> UN  10.33.193.xxx     10.83 GB   256      14.7%
> 59f440db-cd2d-4041-aab4-fc8e9518c954  1b
>
> The data are not streamed.
>
> Can any one help me, our web site is down.
>
> Thanks a lot,
>
>
>
>

Mime
View raw message