incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Slowdowns during repair
Date Thu, 16 Jun 2011 10:41:44 GMT
Look for log messages at the ERROR level first to find out why it's crashing. 

Check for GC pressure during the repair, either using JConsole or log messages from the GCInspector.


Check the nodetool tpstats to get an idea if the nodes are saturated, i.e. are their tasks
in the pending list. Or are they just running with high latency. 

If a node crashes when calculating the Merkle tree's for it's neighbours the repair will hang
(for 48 hours i think) on the node that initiated the repair. I dont think this is immediately
obvious though tpstats .

Start with why it's crashing and whats happening with the GC. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 16 Jun 2011, at 10:20, Aurynn Shaw wrote:

> Hey all;
> 
> So, we have Cassandra running on a 5-server ring, with a RF of 3, and we're regularly
seeing major slowdowns in read & write performance while running nodetool repair, as well
as the occasional Cassandra crash during the repair window - slowdowns past 10 seconds to
perform a single write.
> 
> The repair cycle runs nightly on a different server, so each server has it run once a
week.
> 
> We're running 0.7.0 currently, and we'll be upgrading to 0.7.6 shortly.
> 
> System load on the Cassandra servers is never more than 10% CPU and utterly minimal IO
usage, so I wouldn't think we'd be seeing issues quite like this.
> 
> What sort of knobs should I be looking at tuning to reduce the impact that nodetool repair
has on Cassandra? What questions should I be asking as to why Cassandra slows down to the
level that it does, and what I should be optimizing?
> 
> Additionally, what should I be looking for in the logs when this is happening? There's
a lot in the logs, but I'm not sure what to look for.
> 
> Cassadra is, in this instance, backing a system that supports around a million requests
a day, so not terribly heavy traffic.
> 
> Thanks,
> 
> Aurynn


Mime
View raw message