cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergey Olefir <solf.li...@gmail.com>
Subject node down = log explosion?
Date Tue, 22 Jan 2013 13:03:18 GMT
I have Cassandra 1.1.7 cluster with 4 nodes in 2 datacenters (2+2).
Replication is configured as DC1:2,DC2:2 (i.e. every node holds the entire
data).

I am load-testing counter increments at the rate of about 10k per second.
All writes are directed to two nodes in DC1 (DC2 nodes are basically
backup). In total there's 100 separate clients executing 1-2 batch updates
per second.

We wanted to test what happens if one node goes down, so we brought one node
down in DC1 (i.e. the node that was handling half of the incoming writes).

This led to a complete explosion of logs on the remaining alive node in DC1.

There are hundreds of megabytes of logs within an hour all basically saying
the same thing:
ERROR [ReplicateOnWriteStage:5653390] 2013-01-22 12:44:33,611
AbstractCassandraDaemon.java (line 135) Exception in thread
Thread[ReplicateOnWriteStage:5653390,5,main]
java.lang.RuntimeException: java.util.concurrent.TimeoutException
        at
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1275)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.util.concurrent.TimeoutException
        at
org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:311)
        at
org.apache.cassandra.service.StorageProxy$7$1.runMayThrow(StorageProxy.java:585)
        at
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1271)
        ... 3 more


The logs are completely swamped with this and are thus unusable. Of course
logs should report errors, but we don't need hundred of megabytes of this :)
Is there anything that can be done to reduce the amount of this spam? In
addition to making logs unusable I strongly suspect this spam makes server
unable to accept as many increments as it otherwise could.




--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/node-down-log-explosion-tp7584932.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Mime
View raw message