incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremiah D Jordan <>
Subject Re: Data loss when swapping out cluster
Date Tue, 26 Nov 2013 16:46:31 GMT
TL;DR you need to run repair in between doing those two things.

Full explanation:

-Jeremiah Jordan

On Nov 25, 2013, at 11:00 AM, Christopher J. Bottaro <> wrote:

> Hello,
> We recently experienced (pretty severe) data loss after moving our 4 node Cassandra cluster
from one EC2 availability zone to another.  Our strategy for doing so was as follows:
> One at a time, bring up new nodes in the new availability zone and have them join the
> One at a time, decommission the old nodes in the old availability zone and turn them
off (stop the Cassandra process).
> Everything seemed to work as expected.  As we decommissioned each node, we checked the
logs for messages indicating "yes, this node is done decommissioning" before turning the node
> Pretty quickly after the old nodes left the cluster, we started getting client calls
about data missing.
> We immediately turned the old nodes back on and when they rejoined the cluster *most*
of the reported missing data returned.  For the rest of the missing data, we had to spin up
a new cluster from EBS snapshots and copy it over.
> What did we do wrong?
> In hindsight, we noticed a few things which may be clues...
> The new nodes had much lower load after joining the cluster than the old ones (3-4 gb
as opposed to 10 gb).
> We have EC2Snitch turned on, although we're using SimpleStrategy for replication.
> The new nodes showed even ownership (via nodetool status) after joining the cluster.
> Here's more info about our cluster...
> Cassandra 1.2.10
> Replication factor of 3
> Vnodes with 256 tokens
> All tables made via CQL
> Data dirs on EBS (yes, we are aware of the performance implications)
> Thanks for the help.

View raw message