cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hefeng Yuan <hfy...@rhapsody.com>
Subject One hot node slows down whole cluster
Date Wed, 17 Aug 2011 18:24:28 GMT
Hi,

We're noticing that when one node gets hot (very high cpu usage) because of 'nodetool repair',
the whole cluster's performance becomes really bad.

We're using 0.8.0 with random partition. We have 6 nodes with RF 5. Our repair is scheduled
to run once a week, spread across whole cluster. I do get suggestion from Jonothan that 0.8.0
has some bug on the repair, but wondering why one hot node would slow down the whole cluster.

We saw this symptom yesterday on one node, and today on the adjacent node. Most probably it'll
happen on the next one tomorrow.

We do see lots of (~200) RejectedExecutionException 3 hours before the repair job, and also
in the middle of the repair job, not sure whether they're related. Full stack is attached
in the end.

Do we have any suggestion/hint?

Thanks,
Hefeng


ERROR [pool-2-thread-3097] 2011-08-17 08:42:38,118 Cassandra.java (line 3462) Internal error
processing batch_mutate
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down
	at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73)
	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
	at org.apache.cassandra.service.StorageProxy.insertLocal(StorageProxy.java:360)
	at org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:241)
	at org.apache.cassandra.service.StorageProxy.access$000(StorageProxy.java:62)
	at org.apache.cassandra.service.StorageProxy$1.apply(StorageProxy.java:99)
	at org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:210)
	at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:154)
	at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:560)
	at org.apache.cassandra.thrift.CassandraServer.internal_batch_mutate(CassandraServer.java:511)
	at org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:519)
	at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:3454)
	at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
	at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:619)
ERROR [Thread-137480] 2011-08-17 08:42:38,121 AbstractCassandraDaemon.java (line 113) Fatal
exception in thread Thread[Thread-137480,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down
	at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73)
	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
	at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:444)
	at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:117)
Mime
View raw message