cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fd Habash <>
Subject Why Would a Single Node Drive Read Latency 5x Worse & Shutting it Down Improves it
Date Fri, 09 Mar 2018 14:46:06 GMT

Tying to come up with an explanation for this real-life C* behavior. Under a constant load
test of 2k reads/sec, the very top graph is the read latency for a single node which is a
seed one. 

You may not see it here, but there are 8 other nodes that follow it. Since we have 4 nodes
in each of the 3 AZ’s, I know these other 8 nodes (4 per AZ) hold the secondary replicas
for the seed node.

At the time of taking this metric snapshot, there were 0 compactions, 0 anti-compactions,
0 repairs. Cluster did nothing else but serve read requests.

Shortly after the 0900, I stopped C* on the top seed node and read latency dropped from 1
sec to < 150 ms. 

I know this probably not enough diagnostics to pin-point a cause. What explanations are these
for this behavior? 

Misconfigured client connections? 

The seed node EC2 instance tested fine for general health. 

During the load test, this node had a ‘repair -pr’ job running on it which finished successfully,
but read latency did not improve afterwards. 

Thank you

View raw message