Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 28308 invoked from network); 25 Mar 2011 20:16:32 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 25 Mar 2011 20:16:32 -0000 Received: (qmail 55628 invoked by uid 500); 25 Mar 2011 20:16:29 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 55601 invoked by uid 500); 25 Mar 2011 20:16:29 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 55593 invoked by uid 99); 25 Mar 2011 20:16:29 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Mar 2011 20:16:29 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sylvain@datastax.com designates 209.85.160.172 as permitted sender) Received: from [209.85.160.172] (HELO mail-gy0-f172.google.com) (209.85.160.172) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Mar 2011 20:16:22 +0000 Received: by gyf3 with SMTP id 3so664673gyf.31 for ; Fri, 25 Mar 2011 13:16:01 -0700 (PDT) MIME-Version: 1.0 Received: by 10.150.132.3 with SMTP id f3mr1404249ybd.242.1301084161098; Fri, 25 Mar 2011 13:16:01 -0700 (PDT) Received: by 10.147.32.15 with HTTP; Fri, 25 Mar 2011 13:16:01 -0700 (PDT) X-Originating-IP: [88.183.33.171] In-Reply-To: <59FF41BBA3A96846A100638C0997A5820CD312@EXMBX07.netplexity.local> References: <59FF41BBA3A96846A100638C0997A5820CA6AE@EXMBX07.netplexity.local> <59FF41BBA3A96846A100638C0997A5820CD312@EXMBX07.netplexity.local> Date: Fri, 25 Mar 2011 21:16:01 +0100 Message-ID: Subject: Re: URGENT HELP PLEASE! From: Sylvain Lebresne To: Jared Laprise Cc: "user@cassandra.apache.org" Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org > Although after all the help from the Cassandra community I have a much be= tter understanding of why and how my situation happened, there was still on= e strange side effect I noticed. For context, I store user accounts and oth= er account information in Cassandra. When the second node was offline and I= tried to log into the site, I got an error saying invalid password. Out of= curiosity I logged into the cassandra-cli tool and looked at what columns = and values were present for my user account. My User CF seemed to have data= stored from right before I added the second node. I found that really stra= nge assuming that Cassandra doesn't keep any historical or versioned data? = Again, once the second node was back online both servers showed the expecte= d more current data. What happened is this: You started your cluster with only one node, so at first, all data was on t= his. Then you added a second node. Cassandra then moved (approximatively) half of the data to the second node. In theory, at that point the data that was moved to the second node could be removed from the first node (since you had RF=3D1). However, Cassandra don't do that removing part automatically for safety reasons. You'll have to run cleanup on the first node for that to happen. So there was stale data on the first node, that never got updated because the first node was not responsible anymore for that data. It was garbage that just didn't get removed. What you should have done is run nodetool cleanup on the first node after having bootstrapped the second one and checked everything was fine. > > Today I'm preparing to increase my replication factor to 2 and have been = reading about the proper way to do that. Although I've found bits and piece= s, I haven't found any definitive explanation on how to do it. Could someon= e please sanity check my intended approach? > > 1. Change the RF to 2 and restart Cassandra on both nodes > 2. Run `nodetool repair` on both nodes, one at a time as to not halt up b= oth servers (will that sync data between the nodes?) > > In a 2 node environment and RF=3D2 using consistency level of ONE would s= till ensure data is replicated to both servers, correct? > > -----Original Message----- > From: Sylvain Lebresne [mailto:sylvain@datastax.com] > Sent: Friday, March 25, 2011 3:01 AM > To: user@cassandra.apache.org > Cc: Jared Laprise > Subject: Re: URGENT HELP PLEASE! > > On Fri, Mar 25, 2011 at 1:49 AM, Jared Laprise wrote: >> Hello all, I'm running 2 Cassandra 6.5 nodes and I brought down the >> secondary node and restarted the primary node. After Cassandra came >> back up all data has been reverted to several months ago. > > Out of curiosity, when you said 'brought down the secondary node', did th= at involved a decomission or removeToken ? If so, I have an explanation for= you. > > -- > Sylvain > > >> I could really use some incite here, this is a production website and >> I need to act quickly. I have a cron job that takes a snapshot every >> night, but even with that I tried to restore a snapshot on my local >> development environment and it was also missing a ton of data. >> >> >> >> Any help will be so appreciated. >> >> >> >> >