cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fd Habash <fmhab...@gmail.com>
Subject RE: Read Latency Doubles After Shrinking Cluster and Never Recovers
Date Mon, 11 Jun 2018 16:07:47 GMT
A picture is worth a thousand words!

Bottom graph shows cluster read latency with trend change from around 1/17 to 3/14 when nodes
were initially removed then added. 

Top shows live_ss_table_count. Prior, it was about 720, then trended upwards towards 1000,
and back to ~ 700. Even though sstable count restored, read latency did not.




----------------
Thank you

From: Nicolas Guyomar
Sent: Monday, June 11, 2018 11:32 AM
To: user@cassandra.apache.org
Subject: Re: Read Latency Doubles After Shrinking Cluster and Never Recovers

Really wild guess : do you monitor I/O performance and are positive those are the same over
the past year ? (network becoming a little busier, hard drive a bit slower and so on) ? 
Wild guess 2 : a new 'monitoring' software (log shipping agent for instance) added meanwhile
on the box ? 

On 11 June 2018 at 16:56, Jeff Jirsa <jjirsa@gmail.com> wrote:
No
-- 
Jeff Jirsa


On Jun 11, 2018, at 7:49 AM, Fd Habash <fmhabash@gmail.com> wrote:
I will check for both.
 
On a different subject, I have read some user testimonies that running ‘nodetool cleanup’
requires a C* process reboot at least around 2.2.8. Is this true?
 
 
----------------
Thank you
 
From: Nitan Kainth
Sent: Monday, June 11, 2018 10:40 AM
To: user@cassandra.apache.org
Subject: Re: Read Latency Doubles After Shrinking Cluster and Never Recovers
 
I think it would because it Cassandra will process more sstables to create response to read
queries.
 
Now after clean if the data volume is same and compaction has been running, I can’t think
of any more diagnostic step. Let’s wait for other experts to comment.
 
Can you also check sstable count for each table just to be sure that they are not extraordinarily
high?
Sent from my iPhone

On Jun 11, 2018, at 10:21 AM, Fd Habash <fmhabash@gmail.com> wrote:
Yes we did after adding the three nodes back and a full cluster repair as well. 
 
But even it we didn’t run cleanup, would it have impacted read latency the fact that some
nodes still have sstables that they no longer need? 
 
Thanks 
 
----------------
Thank you
 
From: Nitan Kainth
Sent: Monday, June 11, 2018 10:18 AM
To: user@cassandra.apache.org
Subject: Re: Read Latency Doubles After Shrinking Cluster and Never Recovers
 
Did you run cleanup too? 
 
On Mon, Jun 11, 2018 at 10:16 AM, Fred Habash <fmhabash@gmail.com> wrote:
I have hit dead-ends every where I turned on this issue. 
 
We had a 15-node cluster  that was doing 35 ms all along for years. At some point, we made
a decision to shrink it to 13. Read latency rose to near 70 ms. Shortly after, we decided
this was not acceptable, so we added the three nodes back in. Read latency dropped to near
50 ms and it has been hovering around this value for over 6 months now.
 
Repairs run regularly, load on cluster nodes is even,  application activity profile has not
changed. 
 
Why are we unable to get back the same read latency now that the cluster is 15 nodes large
same as it was before?
 
-- 
 
----------------------------------------
Thank you

 
 
 



Mime
View raw message