cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Jirsa <jeff.ji...@crowdstrike.com>
Subject Re: Would we have data corruption if we bootstrapped 10 nodes at once?
Date Mon, 19 Oct 2015 04:19:23 GMT
Take a snapshot now, before you get rid of any data (whatever you do, don’t run cleanup).


If you identify missing data, you can go back to those snapshots, find the nodes that had
the data previously (sstable2json, for example), and either re-stream that data into the cluster
with sstableloader or copy it to a new host and `nodetool refresh` it into the new system.



From:  <burtonator2011@gmail.com> on behalf of Kevin Burton
Reply-To:  "user@cassandra.apache.org"
Date:  Sunday, October 18, 2015 at 8:10 PM
To:  "user@cassandra.apache.org"
Subject:  Re: Would we have data corruption if we bootstrapped 10 nodes at once?

ouch.. OK.. I think I really shot myself in the foot here then.  This might be bad. 

I'm not sure if I would have missing data.  I mean basically the data is on the other nodes..
but the cluster has been running with 10 nodes accidentally bootstrapped with auto_bootstrap=false.
 

So they have new data and seem to be missing values. 

this is somewhat misleading... Initially if you start it up and run nodetool status , it only
returns one node. 

So I assumed auto_bootstrap=false meant that it just doesn't join the cluster.

I'm running a nodetool repair now to hopefully fix this.



On Sun, Oct 18, 2015 at 7:25 PM, Jeff Jirsa <jeff.jirsa@crowdstrike.com> wrote:
auto_bootstrap=false tells it to join the cluster without running bootstrap – the node assumes
it has all of the necessary data, and won’t stream any missing data.

This generally violates consistency guarantees, but if done on a single node, is typically
correctable with `nodetool repair`.

If you do it on many  nodes at once, it’s possible that the new nodes could represent all
3 replicas of the data, but don’t physically have any of that data, leading to missing records.



From: <burtonator2011@gmail.com> on behalf of Kevin Burton
Reply-To: "user@cassandra.apache.org"
Date: Sunday, October 18, 2015 at 3:44 PM
To: "user@cassandra.apache.org"
Subject: Re: Would we have data corruption if we bootstrapped 10 nodes at once?

An shit.. I think we're seeing corruption.. missing records :-/

On Sat, Oct 17, 2015 at 10:45 AM, Kevin Burton <burton@spinn3r.com> wrote:
We just migrated from a 30 node cluster to a 45 node cluster. (so 15 new nodes) 

By default we have auto_boostrap = false

so we just push our config to the cluster, the cassandra daemons restart, and they're not
cluster members and are the only nodes in the cluster.

Anyway.  While I was about 1/2 way done adding the 15 nodes,  I had about 7 members of the
cluster and 8 not yet joined.

We are only doing 1 at a time because apparently bootstrapping more than 1 is unsafe.  

I did a rolling restart whereby I went through and restarted all the cassandra boxes.  

Somehow the new nodes auto boostrapped themselves EVEN though auto_bootstrap=false.

We don't have any errors.  Everything seems functional.  I'm just worried about data loss.

Thoughts?

Kevin

-- 
We’re hiring if you know of any awesome Java Devops or Linux Operations Engineers!

Founder/CEO Spinn3r.com
Location: San Francisco, CA
blog: http://burtonator.wordpress.com
… or check out my Google+ profile




-- 
We’re hiring if you know of any awesome Java Devops or Linux Operations Engineers!

Founder/CEO Spinn3r.com
Location: San Francisco, CA
blog: http://burtonator.wordpress.com
… or check out my Google+ profile




-- 
We’re hiring if you know of any awesome Java Devops or Linux Operations Engineers!

Founder/CEO Spinn3r.com
Location: San Francisco, CA
blog: http://burtonator.wordpress.com
… or check out my Google+ profile



Mime
View raw message