incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kais Ahmed <k...@neteck-fr.com>
Subject Re: Lost data after expanding cluster c* 1.2.3-1
Date Sat, 06 Apr 2013 11:55:29 GMT
hi aaron,

nodetool compactionstats on all nodes return 1 pending task :

ubuntu@app:~$ nodetool compactionstats host
pending tasks: 1
Active compaction remaining time :        n/a

The command nodetool rebuild_index was launched several days ago.

2013/4/5 aaron morton <aaron@thelastpickle.com>

> but nothing's happening, how can i monitor the progress? and how can i
> know when it's finished?
>
>
> check nodetool compacitonstats
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 4/04/2013, at 2:51 PM, Kais Ahmed <kais@neteck-fr.com> wrote:
>
> Hi aaron,
>
> I ran the command "nodetool rebuild_index host keyspace cf" on all the
> nodes, in the log i see :
>
> INFO [RMI TCP Connection(5422)-10.34.139.xxx] 2013-04-04 08:31:53,641
> ColumnFamilyStore.java (line 558) User Requested secondary index re-build
> for ...
>
> but nothing's happening, how can i monitor the progress? and how can i
> know when it's finished?
>
> Thanks,
>
>
> 2013/4/2 aaron morton <aaron@thelastpickle.com>
>
>> The problem come from that i don't put  auto_boostrap to true for the new
>> nodes, not in this documentation (
>> http://www.datastax.com/docs/1.2/install/expand_ami)
>>
>> auto_bootstrap defaults to True if not specified in the yaml.
>>
>> can i do that at any time, or when the cluster are not loaded
>>
>> Not sure what the question is.
>> Both those operations are online operations you can do while the node is
>> processing requests.
>>
>> Cheers
>>
>>    -----------------
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 1/04/2013, at 9:26 PM, Kais Ahmed <kais@neteck-fr.com> wrote:
>>
>> > At this moment the errors started, we see that members and other data
>> are gone, at this moment the nodetool status return (in red color the 3 new
>> nodes)
>> > What errors?
>> The errors was in my side in the application, not cassandra errors
>>
>> > I put for each of them seeds = A ip, and start each with two minutes
>> intervals.
>> > When I'm making changes I tend to change a single node first, confirm
>> everything is OK and then do a bulk change.
>> Thank you for that advice.
>>
>> >I'm not sure what or why it went wrong, but that should get you to a
>> stable place. If you have any problems keep an eye on the logs for errors
>> or warnings.
>> The problem come from that i don't put  auto_boostrap to true for the new
>> nodes, not in this documentation (
>> http://www.datastax.com/docs/1.2/install/expand_ami)
>>
>> >if you are using secondary indexes use nodetool rebuild_index to rebuild
>> those.
>> can i do that at any time, or when the cluster are not loaded
>>
>> Thanks aaron,
>>
>> 2013/4/1 aaron morton <aaron@thelastpickle.com>
>>
>>> Please do not rely on colour in your emails, the best way to get your
>>> emails accepted by the Apache mail servers is to use plain text.
>>>
>>> > At this moment the errors started, we see that members and other data
>>> are gone, at this moment the nodetool status return (in red color the 3 new
>>> nodes)
>>> What errors?
>>>
>>> > I put for each of them seeds = A ip, and start each with two minutes
>>> intervals.
>>> When I'm making changes I tend to change a single node first, confirm
>>> everything is OK and then do a bulk change.
>>>
>>> > Now the cluster seem to work normally, but i can use the secondary for
>>> the moment, the queryanswer are random
>>> run nodetool repair -pr on each node, let it finish before starting the
>>> next one.
>>> if you are using secondary indexes use nodetool rebuild_index to rebuild
>>> those.
>>> Add one node new node to the cluster and confirm everything is ok, then
>>> add the remaining ones.
>>>
>>> >I'm not sure what or why it went wrong, but that should get you to a
>>> stable place. If you have any problems keep an eye on the logs for errors
>>> or warnings.
>>>
>>> Cheers
>>>
>>> -----------------
>>> Aaron Morton
>>> Freelance Cassandra Consultant
>>> New Zealand
>>>
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 31/03/2013, at 10:01 PM, Kais Ahmed <kais@neteck-fr.com> wrote:
>>>
>>> > Hi aaron,
>>> >
>>> > Thanks for reply, i will try to explain what append exactly
>>> >
>>> > I had 4 C* called [A,B,C,D] cluster (1.2.3-1 version) start with ec2
>>> ami (https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2) with
>>> > this config --clustername myDSCcluster --totalnodes 4--version
>>> community
>>> >
>>> > Two days after this cluster in production, i saw that the cluster was
>>> overload, I wanted to extend it by adding 3 another nodes.
>>> >
>>> > I create a new cluster with 3 C* [D,E,F]  (
>>> https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2)
>>> >
>>> > And follow the documentation (
>>> http://www.datastax.com/docs/1.2/install/expand_ami) for adding them in
>>> the ring.
>>> > I put for each of them seeds = A ip, and start each with two minutes
>>> intervals.
>>> >
>>> > At this moment the errors started, we see that members and other data
>>> are gone, at this moment the nodetool status return (in red color the 3 new
>>> nodes)
>>> >
>>> > Datacenter: eu-west
>>> > ===================
>>> > Status=Up/Down
>>> > |/ State=Normal/Leaving/Joining/
>>> >> Moving
>>> >> --  Address           Load       Tokens  Owns   Host ID
>>>                 Rack
>>> >> UN  10.34.142.xxx     10.79 GB   256     15.4%
>>>  4e2e26b8-aa38-428c-a8f5-e86c13eb4442  1b
>>> >> UN  10.32.49.xxx       1.48 MB    256        13.7%
>>>  e86f67b6-d7cb-4b47-b090-3824a5887145  1b
>>> >> UN  10.33.206.xxx      2.19 MB    256    11.9%
>>>  92af17c3-954a-4511-bc90-29a9657623e4  1b
>>> >> UN  10.32.27.xxx       1.95 MB    256      14.9%
>>>  862e6b39-b380-40b4-9d61-d83cb8dacf9e  1b
>>> >> UN  10.34.139.xxx     11.67 GB   256    15.5%
>>>  0324e394-b65f-46c8-acb4-1e1f87600a2c  1b
>>> >> UN  10.34.147.xxx     11.18 GB   256     13.9%
>>>  cfc09822-5446-4565-a5f0-d25c917e2ce8  1b
>>> >> UN  10.33.193.xxx     10.83 GB   256      14.7%
>>>  59f440db-cd2d-4041-aab4-fc8e9518c954  1b
>>> >
>>> > I saw that the 3 nodes have join the ring but they had no data, i put
>>> the website in maintenance and lauch a nodetool repair on
>>> > the 3 new nodes, during 5 hours i see in opcenter the data streamed to
>>> the new nodes (very nice :))
>>> >
>>> > During this time, i write a script to check if all members are present
>>> (relative to a copy of members in mysql).
>>> >
>>> > After data streamed seems to be finish, but i'm not sure because
>>> nodetool compactionstats show pending task but nodetool netstats seems to
>>> be ok.
>>> >
>>> > I ran my script to check if the data, but members are still missing.
>>> >
>>> > I decide to roolback by running nodetool decommission node D, E, F
>>> >
>>> > I re run my script, all seems to be ok but secondary index have
>>> strange behavior,
>>> > some time the row was returned some times no result.
>>> >
>>> > the user kais can be retrieve using his key with cassandra-cli but if
>>> i use cqlsh :
>>> >
>>> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
>>> >
>>> >  login
>>> > ----------------
>>> >  kais
>>> >
>>> > cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty
>>> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
>>> >
>>> >  login
>>> > ----------------
>>> >  kais
>>> >
>>> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
>>> >
>>> >  login
>>> > ----------------
>>> >  kais
>>> >
>>> > cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty
>>> > cqlsh:database> SELECT login FROM userdata where login='kais' ;
>>> >
>>> >  login
>>> > ----------------
>>> >  kais
>>> >
>>> > cqlsh:mydatabase>Tracing on;
>>> > When tracing is activate i have this error but not all time
>>> > cqlsh:mydatabase> SELECT * FROM userdata where login='kais' ;
>>> > unsupported operand type(s) for /: 'NoneType' and 'float'
>>> >
>>> >
>>> > NOTE : When the cluster contained 7 nodes, i see that my table
>>> userdata (RF 3) on node D was replicated on E and F, that would seem
>>> strange because its 3 node was not correctly filled
>>> >
>>> > Now the cluster seem to work normally, but i can use the secondary for
>>> the moment, the query answer are random
>>> >
>>> > Thanks a lot for any help,
>>> > Kais
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > 2013/3/31 aaron morton <aaron@thelastpickle.com>
>>> > First thought is the new nodes were marked as seeds.
>>> > Next thought is check the logs for errors.
>>> >
>>> > You can always run a nodetool repair if you are concerned data is not
>>> where you think it should be.
>>> >
>>> > Cheers
>>> >
>>> >
>>> > -----------------
>>> > Aaron Morton
>>> > Freelance Cassandra Consultant
>>> > New Zealand
>>> >
>>> > @aaronmorton
>>> > http://www.thelastpickle.com
>>> >
>>> > On 29/03/2013, at 8:01 PM, Kais Ahmed <kais@neteck-fr.com> wrote:
>>> >
>>> >> Hi all,
>>> >>
>>> >> I follow this tutorial for expanding a 4 c* cluster (production) and
>>> add 3 new nodes.
>>> >>
>>> >> Datacenter: eu-west
>>> >> ===================
>>> >> Status=Up/Down
>>> >> |/ State=Normal/Leaving/Joining/Moving
>>> >> --  Address           Load       Tokens  Owns   Host ID
>>>                 Rack
>>> >> UN  10.34.142.xxx     10.79 GB   256     15.4%
>>>  4e2e26b8-aa38-428c-a8f5-e86c13eb4442  1b
>>> >> UN  10.32.49.xxx       1.48 MB    256        13.7%
>>>  e86f67b6-d7cb-4b47-b090-3824a5887145  1b
>>> >> UN  10.33.206.xxx      2.19 MB    256    11.9%
>>>  92af17c3-954a-4511-bc90-29a9657623e4  1b
>>> >> UN  10.32.27.xxx       1.95 MB    256      14.9%
>>>  862e6b39-b380-40b4-9d61-d83cb8dacf9e  1b
>>> >> UN  10.34.139.xxx     11.67 GB   256    15.5%
>>>  0324e394-b65f-46c8-acb4-1e1f87600a2c  1b
>>> >> UN  10.34.147.xxx     11.18 GB   256     13.9%
>>>  cfc09822-5446-4565-a5f0-d25c917e2ce8  1b
>>> >> UN  10.33.193.xxx     10.83 GB   256      14.7%
>>>  59f440db-cd2d-4041-aab4-fc8e9518c954  1b
>>> >>
>>> >> The data are not streamed.
>>> >>
>>> >> Can any one help me, our web site is down.
>>> >>
>>> >> Thanks a lot,
>>> >>
>>> >>
>>> >
>>> >
>>>
>>>
>>
>>
>
>

Mime
View raw message