From Jeff Jirsa <>
Subject Re: What happened about one node in cluster is down?
Date Fri, 02 Aug 2019 12:28:55 GMT

> On Aug 2, 2019, at 12:21 AM, Martin Xue <> wrote:
> Hello,
> I am currently running into a production issue, and seek help from the community to help.
> Can anyone help with the following question regarding the Cassandra down node inside
> Case:
> Cassandra 3.0.14
> 3 nodes (A, B, C) in DC1, 3 nodes (D, E, F) in DC2 forming one cluster
> keyspace_m: Replication Factor is 2 in DC1, and DC2
> application_z read and write consistency is both local quorum

RF=2 and local quorum basically guarantees an outage in a given DC if any single host dies,
so it’s only recommended if you can fail out of a DC safely (which means eventually consistent
data model, when you fail out the remote DC is in an undefined state since you’re using
local quorum)

> Issue:
> node A in DC1 has crashed, and has been down for more than 24 hours, (outside the default
hint3 hours window).
> Questions:
> 1. for old data in node A, will the data be re-sync to node B, or C after node A was

Both, but only B or C for any piece of data

With RF=2, data is on either:

So if A crashes, bringing it back or replacing it will sync from its only surviving replica
for each piece of data

> 2. for new data, if application_z is trying to write, will the data be always written
to the only two running nodes (B and C) in DC1, or it will fail if it still tries to write
to node A?

It will fail. Ownership doesn’t change just because one host goes down. For a piece of data
owned by A and any other node, you’re going to fail if A is down and you use this replication
factor and consistency 

> 3. if application_z is to read, will it fail (for old data before node A crash and for
new data after node A crash)? will the data be replicated from A to B or C?
Fail, will throw unavailable exception 

> 3. what is the best strategy under this senario? 

Go to RF=3 or read and write at quorum so you’re doing 3/4 instead of 2/2 (but then you’ll
fail of the wan link goes down, and your reads and writes will cross the wan adding latency)

> 4. Shall I bring up the node A and run repair on all the nodes (A, B, C, D, E, F) 
> (a potential issue, as repair may cause the similar crash happened on node A , and there
are big 1TB keyspace to repair)

Since you’re past hint window, you’re going to have a lot of data to repair, and your
chance of resurrecting data due to exceeding gc grace is nonzero, so it may make sense to
replace. Replace will take longer, so bringing it online may be an easier way to end the outage,
depending on the business cost of data resurrection (unless you have “only purge repaired
tombstones” which will prevent resurrection, though potentially introduces other issues
with incremental repair)

> 5. Shall I simply just decommision node A, and add new node F into DC1 into cluster?

May be easier than trying to run repair. In this scenario only, you can replace without running
repair and without violating consistency 

> Your help would be appreciated.
> Thanks
> Regards
> Martin

