cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <>
Subject Re: ReplicateOnWriteStage exception causes a backlog in MutationStage that never clears
Date Wed, 21 Mar 2012 17:24:31 GMT
The node is overloaded with hints.  

I'll just grab the comments from codeā€¦

            // avoid OOMing due to excess hints.  we need to do this check even for "live"
nodes, since we can
            // still generate hints for those if it's overloaded or simply dead but not yet
            // The idea is that if we have over maxHintsInProgress hints in flight, this is
probably due to
            // a small number of nodes causing problems, so we should avoid shutting down
writes completely to
            // healthy nodes.  Any node with no hintsInProgress is considered healthy.

Are the nodes going up and down a lot ? Are they under GC pressure. The other possibility
is that you have overloaded the cluster. 


Aaron Morton
Freelance Developer

On 22/03/2012, at 3:20 AM, Thomas van Neerijnen wrote:

> Hi all
> I'm running into a weird error on Cassandra 1.0.7.
> As my clusters load gets heavier many of the nodes seem to hit the same error around
the same time, resulting in MutationStage backing up and never clearing down. The only way
to recover the cluster is to kill all the nodes and start them up again. The error is as below
and is repeated continuously until I kill the Cassandra process.
> ERROR [ReplicateOnWriteStage:57] 2012-03-21 14:02:05,099
(line 139) Fatal exception in thread Thread[ReplicateOnWriteStage:57,5,main]
> java.lang.RuntimeException: java.util.concurrent.TimeoutException
>         at org.apache.cassandra.service.StorageProxy$
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(
>         at java.util.concurrent.ThreadPoolExecutor$
>         at
> Caused by: java.util.concurrent.TimeoutException
>         at org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(
>         at org.apache.cassandra.service.StorageProxy$7$1.runMayThrow(
>         at org.apache.cassandra.service.StorageProxy$
>         ... 3 more

View raw message