incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hefeng Yuan <hfy...@rhapsody.com>
Subject Re: One hot node slows down whole cluster
Date Wed, 17 Aug 2011 18:52:38 GMT
Just wondering, would it help if we shorten the rpc_timeout_in_ms (currently using 30,000),
so that when one node gets hot and responding slowly, others will just take it as down and
move forward?

On Aug 17, 2011, at 11:35 AM, Hefeng Yuan wrote:

> Sorry, correction, we're using 0.8.1.
> 
> On Aug 17, 2011, at 11:24 AM, Hefeng Yuan wrote:
> 
>> Hi,
>> 
>> We're noticing that when one node gets hot (very high cpu usage) because of 'nodetool
repair', the whole cluster's performance becomes really bad.
>> 
>> We're using 0.8.1 with random partition. We have 6 nodes with RF 5. Our repair is
scheduled to run once a week, spread across whole cluster. I do get suggestion from Jonothan
that 0.8.0 has some bug on the repair, but wondering why one hot node would slow down the
whole cluster.
>> 
>> We saw this symptom yesterday on one node, and today on the adjacent node. Most probably
it'll happen on the next one tomorrow.
>> 
>> We do see lots of (~200) RejectedExecutionException 3 hours before the repair job,
and also in the middle of the repair job, not sure whether they're related. Full stack is
attached in the end.
>> 
>> Do we have any suggestion/hint?
>> 
>> Thanks,
>> Hefeng
>> 
>> 
>> ERROR [pool-2-thread-3097] 2011-08-17 08:42:38,118 Cassandra.java (line 3462) Internal
error processing batch_mutate
>> java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down
>> 	at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73)
>> 	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
>> 	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
>> 	at org.apache.cassandra.service.StorageProxy.insertLocal(StorageProxy.java:360)
>> 	at org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:241)
>> 	at org.apache.cassandra.service.StorageProxy.access$000(StorageProxy.java:62)
>> 	at org.apache.cassandra.service.StorageProxy$1.apply(StorageProxy.java:99)
>> 	at org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:210)
>> 	at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:154)
>> 	at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:560)
>> 	at org.apache.cassandra.thrift.CassandraServer.internal_batch_mutate(CassandraServer.java:511)
>> 	at org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:519)
>> 	at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:3454)
>> 	at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
>> 	at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> 	at java.lang.Thread.run(Thread.java:619)
>> ERROR [Thread-137480] 2011-08-17 08:42:38,121 AbstractCassandraDaemon.java (line
113) Fatal exception in thread Thread[Thread-137480,5,main]
>> java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down
>> 	at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73)
>> 	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
>> 	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
>> 	at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:444)
>> 	at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:117)
> 


Mime
View raw message