Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8F51018B51 for ; Mon, 19 Oct 2015 15:49:16 +0000 (UTC) Received: (qmail 12787 invoked by uid 500); 19 Oct 2015 15:49:14 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 12746 invoked by uid 500); 19 Oct 2015 15:49:14 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 12736 invoked by uid 99); 19 Oct 2015 15:49:14 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Oct 2015 15:49:14 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 81E611A0A8A for ; Mon, 19 Oct 2015 15:49:13 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.9 X-Spam-Level: ** X-Spam-Status: No, score=2.9 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id ySzu1LqzDPlO for ; Mon, 19 Oct 2015 15:49:00 +0000 (UTC) Received: from mail-io0-f175.google.com (mail-io0-f175.google.com [209.85.223.175]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 2C74A20FF5 for ; Mon, 19 Oct 2015 15:49:00 +0000 (UTC) Received: by iofz202 with SMTP id z202so52937840iof.2 for ; Mon, 19 Oct 2015 08:48:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-type; bh=QTHDuQOrYBWYCOgpS9ANL2vPHGF4bXVMuGsfk+Gg3mw=; b=TVeEEVPQhJ9XBD682SzfuhGvhH5IvrwuvW2VYM7jFlV56fSFjytzyYFSWLynG4v4jN L5THDIn30jam/7cZvnUiFt4/bWNA6d3/EdGDul/7G/3cP+UyVAZixKpFzy+uYDLXtxWT CRuVwUvuOMSCcFXg+L9f6XExWGD798jrXV5BajKOUNdYu4ekCFvkXHxoxs//lMLm34Jp 9sCtAzbnB41Rklx2NDK7XCQT8L8DnLpS5xG2QFKB0FgYVp47IrXkhOP4wMZdlpt2ON7a RyUu4qKoauK9z4M6eTIt9xuEP7wdwIsu8ME5YbfRggidupN2JozGGCsMP5MgOmUom529 clnA== X-Received: by 10.107.136.196 with SMTP id s65mr30122630ioi.135.1445269739545; Mon, 19 Oct 2015 08:48:59 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Eric Stevens Date: Mon, 19 Oct 2015 15:48:49 +0000 Message-ID: Subject: Re: Would we have data corruption if we bootstrapped 10 nodes at once? To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001a113ec8946e2b8705227715cc --001a113ec8946e2b8705227715cc Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable It seems to me that as long as cleanup hasn't happened, if you *decommission* the newly joined nodes, they'll stream whatever writes they took back to the original replicas. Presumably that should be pretty quick as they won't have nearly as much data as the original nodes (as they only hold data written while they were online). Then as long as cleanup hasn't happened, your cluster should have returned to a consistent view of the data. You can now bootstrap the new nodes again. If you have done a cleanup, then the data is probably irreversibly corrupted, you will have to figure out how to restore the missing data incrementally from backups if they are available. On Sun, Oct 18, 2015 at 10:37 PM Raj Chudasama wrote: > In this can does it make sense to remove newly added nodes, correct the > configuration and have them rejoin one at a time ? > > Thx > > Sent from my iPhone > > On Oct 18, 2015, at 11:19 PM, Jeff Jirsa > wrote: > > Take a snapshot now, before you get rid of any data (whatever you do, > don=E2=80=99t run cleanup). > > If you identify missing data, you can go back to those snapshots, find th= e > nodes that had the data previously (sstable2json, for example), and eithe= r > re-stream that data into the cluster with sstableloader or copy it to a n= ew > host and `nodetool refresh` it into the new system. > > > > From: on behalf of Kevin Burton > Reply-To: "user@cassandra.apache.org" > Date: Sunday, October 18, 2015 at 8:10 PM > To: "user@cassandra.apache.org" > Subject: Re: Would we have data corruption if we bootstrapped 10 nodes at > once? > > ouch.. OK.. I think I really shot myself in the foot here then. This > might be bad. > > I'm not sure if I would have missing data. I mean basically the data is > on the other nodes.. but the cluster has been running with 10 nodes > accidentally bootstrapped with auto_bootstrap=3Dfalse. > > So they have new data and seem to be missing values. > > this is somewhat misleading... Initially if you start it up and run > nodetool status , it only returns one node. > > So I assumed auto_bootstrap=3Dfalse meant that it just doesn't join the > cluster. > > I'm running a nodetool repair now to hopefully fix this. > > > > On Sun, Oct 18, 2015 at 7:25 PM, Jeff Jirsa > wrote: > >> auto_bootstrap=3Dfalse tells it to join the cluster without running >> bootstrap =E2=80=93 the node assumes it has all of the necessary data, a= nd won=E2=80=99t >> stream any missing data. >> >> This generally violates consistency guarantees, but if done on a single >> node, is typically correctable with `nodetool repair`. >> >> If you do it on many nodes at once, it=E2=80=99s possible that the new = nodes >> could represent all 3 replicas of the data, but don=E2=80=99t physically= have any >> of that data, leading to missing records. >> >> >> >> From: on behalf of Kevin Burton >> Reply-To: "user@cassandra.apache.org" >> Date: Sunday, October 18, 2015 at 3:44 PM >> To: "user@cassandra.apache.org" >> Subject: Re: Would we have data corruption if we bootstrapped 10 nodes >> at once? >> >> An shit.. I think we're seeing corruption.. missing records :-/ >> >> On Sat, Oct 17, 2015 at 10:45 AM, Kevin Burton >> wrote: >> >>> We just migrated from a 30 node cluster to a 45 node cluster. (so 15 ne= w >>> nodes) >>> >>> By default we have auto_boostrap =3D false >>> >>> so we just push our config to the cluster, the cassandra daemons >>> restart, and they're not cluster members and are the only nodes in the >>> cluster. >>> >>> Anyway. While I was about 1/2 way done adding the 15 nodes, I had >>> about 7 members of the cluster and 8 not yet joined. >>> >>> We are only doing 1 at a time because apparently bootstrapping more tha= n >>> 1 is unsafe. >>> >>> I did a rolling restart whereby I went through and restarted all the >>> cassandra boxes. >>> >>> Somehow the new nodes auto boostrapped themselves EVEN though >>> auto_bootstrap=3Dfalse. >>> >>> We don't have any errors. Everything seems functional. I'm just >>> worried about data loss. >>> >>> Thoughts? >>> >>> Kevin >>> >>> -- >>> >>> We=E2=80=99re hiring if you know of any awesome Java Devops or Linux Op= erations >>> Engineers! >>> >>> Founder/CEO Spinn3r.com >>> Location: *San Francisco, CA* >>> blog: http://burtonator.wordpress.com >>> =E2=80=A6 or check out my Google+ profile >>> >>> >>> >> >> >> -- >> >> We=E2=80=99re hiring if you know of any awesome Java Devops or Linux Ope= rations >> Engineers! >> >> Founder/CEO Spinn3r.com >> Location: *San Francisco, CA* >> blog: http://burtonator.wordpress.com >> =E2=80=A6 or check out my Google+ profile >> >> >> > > > -- > > We=E2=80=99re hiring if you know of any awesome Java Devops or Linux Oper= ations > Engineers! > > Founder/CEO Spinn3r.com > Location: *San Francisco, CA* > blog: http://burtonator.wordpress.com > =E2=80=A6 or check out my Google+ profile > > > --001a113ec8946e2b8705227715cc Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
It seems to me that as long as cleanup hasn't happened= , if you decommission=C2=A0the newly joined nodes, they'll strea= m whatever writes they took back to the original replicas.=C2=A0 Presumably= that should be pretty quick as they won't have nearly as much data as = the original nodes (as they only hold data written while they were online).= =C2=A0 Then as long as cleanup hasn't happened, your cluster should hav= e returned to a consistent view of the data.=C2=A0 You can now bootstrap th= e new nodes again.

If you have done a cleanup, then the = data is probably irreversibly corrupted, you will have to figure out how to= restore the missing data incrementally from backups if they are available.=

On Sun, Oct 18,= 2015 at 10:37 PM Raj Chudasama <raj.chudasama@gmail.com> wrote:
In this can does it make sense to remove new= ly added nodes, correct the configuration and have them rejoin one at a tim= e ?

Thx

Sent from my iPhone

On Oct 18, 2015, at 11:19 PM, Jeff Jirsa <= jeff.jirsa@= crowdstrike.com> wrote:

=
Take a snapshot now, before you get rid of any data (whatever you= do, don=E2=80=99t run cleanup).=C2=A0

If you iden= tify missing data, you can go back to those snapshots, find the nodes that = had the data previously (sstable2json, for example), and either re-stream t= hat data into the cluster with sstableloader or copy it to a new host and `= nodetool refresh` it into the new system.



From: <burtonator2011@gmail.com> o= n behalf of Kevin Burton
Reply-To: "us= er@cassandra.apache.org"
Date:= Sunday, October 18, 2015 at 8:10 PM
To: "user@cassandra.apache.org"
Subject: Re: Would we have data corruption if we bootstrap= ped 10 nodes at once?

ou= ch.. OK.. I think I really shot myself in the foot here then.=C2=A0 This mi= ght be bad.

I'm not sure if I would have missing data.=C2=A0 I = mean basically the data is on the other nodes.. but the cluster has been ru= nning with 10 nodes accidentally bootstrapped with auto_bootstrap=3Dfalse. = =C2=A0

So they have new data and seem to be missin= g values.=C2=A0

this is somewhat misleading... Initi= ally if you start it up and run nodetool status , it only returns one node.= =C2=A0

So I assumed auto_bootstrap=3Dfalse meant t= hat it just doesn't join the cluster.

I'm = running a nodetool repair now to hopefully fix this.



On Sun, Oct 18, 2015 at 7:25 PM, Jeff Jirsa <jeff.ji= rsa@crowdstrike.com> wrote:
auto_bootstrap=3Dfalse tells it to = join the cluster without running bootstrap =E2=80=93 the node assumes it ha= s all of the necessary data, and won=E2=80=99t stream any missing data.

This generally violates consistency guarantees, but i= f done on a single node, is typically correctable with `nodetool repair`.

If you do it on many =C2=A0nodes at once, it=E2=80= =99s possible that the new nodes could represent all 3 replicas of the data= , but don=E2=80=99t physically have any of that data, leading to missing re= cords.



From: <burtonator2011@gmail.com> on behalf of Kevin Burton
<= span style=3D"font-weight:bold">Reply-To: "user@cassandra.apache.org&qu= ot;
Date: Sunday, October 18, 20= 15 at 3:44 PM
To: "user@cassandra.apac= he.org"
Subject: Re: Wo= uld we have data corruption if we bootstrapped 10 nodes at once?
<= div>

An shit.. I think we'= ;re seeing corruption.. missing records :-/

On Sat, Oct 17, 2015 at 10:45 AM, Kevin Bur= ton <burton@spinn3r.= com> wrote:
We just migrated from a 30 node cluster to a 45 node cluster. (so 15 new = nodes)

By default we have auto_boostrap =3D false
so we just push our config to the cluster, the cassandra daemo= ns restart, and they're not cluster members and are the only nodes in t= he cluster.

Anyway.=C2=A0 While I was about 1/2 wa= y done adding the 15 nodes, =C2=A0I had about 7 members of the cluster and = 8 not yet joined.

We are only doing 1 at a time be= cause apparently bootstrapping more than 1 is unsafe. =C2=A0

=
I did a rolling restart whereby I went through and restarted all= the cassandra boxes. =C2=A0

Somehow the new nodes= auto boostrapped themselves EVEN though auto_bootstrap=3Dfalse.
=
We don't have any errors.=C2=A0 Everything seems functio= nal.=C2=A0 I'm just worried about data loss.

T= houghts?

Kevin
=

--

We=E2=80=99re hiring = if you know of any awesome Java Devops or Linux Operations Engineers!

Founder/CEO=C2=A0Spinn3r.com
Location:=C2=A0San F= rancisco, CA
=E2=80=A6 or check out my Google+ profile

=
=



--

We=E2=80=99re hiring = if you know of any awesome Java Devops or Linux Operations Engineers!

Founder/CEO=C2=A0Spinn3r.com
Location:=C2=A0San F= rancisco, CA
=E2=80=A6 or check out my Google+ profile

=



--

We=E2=80=99re hiring = if you know of any awesome Java Devops or Linux Operations Engineers!

Founder/CEO=C2=A0Spinn3r.com
Location:=C2=A0San F= rancisco, CA
=E2=80=A6 or check out my Google+ profile

=
--001a113ec8946e2b8705227715cc--