cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fd Habash <fmhab...@gmail.com>
Subject On a 12-node Cluster, Starting C* on a Seed Node Increases Read Latency from 150ms to 1.5 sec.
Date Fri, 02 Mar 2018 14:42:23 GMT
This is a 2.8.8. cluster with three AWS AZs, each with 4 nodes.

Few days ago we noticed a single node’s read latency reaching 1.5 secs there was 8 others
with read latencies going up near 900 ms. 

This single node was a seed node and it was running a ‘repair -pr’ at the time. We intervened
as follows …

• Stopping compactions during repair did not improve latency.
• Killing repair brought down latency to 200 ms on the seed node and the other 8.
• Restarting C* on the seed node increased latency again back to near 1.5 secs on the seed
and other 8. At this point, there was no repair running and compactions were running. We left
them alone. 

At this point, we saw that putting the seed node back in the cluster consistently worsened
latencies on seed and 8 nodes = 9 out of the 12 nodes in the cluster. 

So we decided to bootstrap it. During the bootstrapping and afterwards, latencies remained
near 200 ms which is what we wanted for now. 

All we were able to see is that the seed node in question was different in that it had 5000
sstables while all others had around 2300. After bootstrap, seed node sstables reduced to
2500.

Why would starting C* on a single seed node affect the cluster this bad? Again, no repair
just 4 compactions that run routinely on it as well all others. Is it gossip?  What other
plausible explanations are there?

----------------
Thank you


Mime
View raw message