cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefano Ortolani (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-13043) UnavailabeException caused by counter writes forwarded to leaders without complete cluster view
Date Mon, 04 Sep 2017 10:45:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149262#comment-16149262
] 

Stefano Ortolani edited comment on CASSANDRA-13043 at 9/4/17 10:44 AM:
-----------------------------------------------------------------------

Some updates: added to ccm the ability to send a byteman rule when restarting a node (https://github.com/ostefano/ccm/tree/startup_byteman).
Originally it was only possible to submit a rule _after_ the node started, which made reproducing
this race quite impossible.

With that I could finally reproduce the bug. Here you can my branch with a new DTEST exercising
the bug: https://github.com/ostefano/cassandra-dtest/tree/CASSANDRA-13043

Attached you can find a patch fixing the bug by removing nodes that are not RPC ready from
a leader election. Also, I fixed the corner case where the CL required is local and the current
DC does not have available nodes.

Here you can find the commit: https://github.com/ostefano/cassandra/commit/91cc9b4398009a3cee3004bc11a047c056fda6a6


was (Author: ostefano):
Some updates: added to ccm the ability to send a byteman rule when restarting a node (https://github.com/ostefano/ccm/tree/startup_byteman).


This allowed me to finally reproduce the bug: https://github.com/ostefano/cassandra-dtest/tree/CASSANDRA-13043

Attached a patch that try to fix it by removing from a leader election nodes that are not
RPC ready.
Also, fixed corner case where the CL required is local.

> UnavailabeException caused by counter writes forwarded to leaders without complete cluster
view
> -----------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-13043
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13043
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Debian
>            Reporter: Catalin Alexandru Zamfir
>         Attachments: patch.diff
>
>
> In version 3.9 of Cassandra, we get the following exceptions on the system.log whenever
booting an agent. They seem to grow in number with each reboot. Any idea where they come from
or what can we do about them? Note that the cluster is healthy (has sufficient live nodes).
> {noformat}
> 2/14/2016 12:39:47 PMINFO  10:39:47 Updating topology for /10.136.64.120
> 12/14/2016 12:39:47 PMINFO  10:39:47 Updating topology for /10.136.64.120
> 12/14/2016 12:39:47 PMWARN  10:39:47 Uncaught exception on thread Thread[CounterMutationStage-111,5,main]:
{}
> 12/14/2016 12:39:47 PMorg.apache.cassandra.exceptions.UnavailableException: Cannot achieve
consistency level LOCAL_QUORUM
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.db.ConsistencyLevel.assureSufficientLiveNodes(ConsistencyLevel.java:313)
~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.service.AbstractWriteResponseHandler.assureSufficientLiveNodes(AbstractWriteResponseHandler.java:146)
~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:1054)
~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.service.StorageProxy.applyCounterMutationOnLeader(StorageProxy.java:1450)
~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.db.CounterMutationVerbHandler.doVerb(CounterMutationVerbHandler.java:48)
~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64)
~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[na:1.8.0_111]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109)
[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
> 12/14/2016 12:39:47 PMWARN  10:39:47 Uncaught exception on thread Thread[CounterMutationStage-118,5,main]:
{}
> 12/14/2016 12:39:47 PMorg.apache.cassandra.exceptions.UnavailableException: Cannot achieve
consistency level LOCAL_QUORUM
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.db.ConsistencyLevel.assureSufficientLiveNodes(ConsistencyLevel.java:313)
~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.service.AbstractWriteResponseHandler.assureSufficientLiveNodes(AbstractWriteResponseHandler.java:146)
~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:1054)
~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.service.StorageProxy.applyCounterMutationOnLeader(StorageProxy.java:1450)
~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.db.CounterMutationVerbHandler.doVerb(CounterMutationVerbHandler.java:48)
~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64)
~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[na:1.8.0_111]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109)
[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
> 12/14/2016 12:39:47 PMWARN  10:39:47 Uncaught exception on thread Thread[CounterMutationStage-164,5,main]:
{}
> 12/14/2016 12:39:47 PMorg.apache.cassandra.exceptions.UnavailableException: Cannot achieve
consistency level LOCAL_QUORUM
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.db.ConsistencyLevel.assureSufficientLiveNodes(ConsistencyLevel.java:313)
~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.service.AbstractWriteResponseHandler.assureSufficientLiveNodes(AbstractWriteResponseHandler.java:146)
~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:1054)
~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.service.StorageProxy.applyCounterMutationOnLeader(StorageProxy.java:1450)
~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.db.CounterMutationVerbHandler.doVerb(CounterMutationVerbHandler.java:48)
~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64)
~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[na:1.8.0_111]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109)
[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
> 12/14/2016 12:39:47 PMWARN  10:39:47 Uncaught exception on thread Thread[CounterMutationStage-117,5,main]:
{}
> 12/14/2016 12:39:47 PMorg.apache.cassandra.exceptions.UnavailableException: Cannot achieve
consistency level LOCAL_QUORUM
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.db.ConsistencyLevel.assureSufficientLiveNodes(ConsistencyLevel.java:313)
~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.service.AbstractWriteResponseHandler.assureSufficientLiveNodes(AbstractWriteResponseHandler.java:146)
~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:1054)
~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.service.StorageProxy.applyCounterMutationOnLeader(StorageProxy.java:1450)
~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.db.CounterMutationVerbHandler.doVerb(CounterMutationVerbHandler.java:48)
~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64)
~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[na:1.8.0_111]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109)
[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message