cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7816) Updated the "4.2.6. EVENT" section in the binary protocol specification
Date Tue, 03 Mar 2015 09:17:05 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344785#comment-14344785
] 

Stefania commented on CASSANDRA-7816:
-------------------------------------

It's quite easy to reproduce, I added a new test, {{restart_node_test}} to pushed_notifications_test.py,
available in this pull request: https://github.com/riptano/cassandra-dtest/pull/177.

There are always two DOWN notifications, and this is deterministic. They are generated by:

{code}
INFO  [GossipStage:1] 2015-03-03 01:10:47,156 Server.java:413 - Thread[GossipStage:1,5,main]
        at java.lang.Thread.getStackTrace(Thread.java:1589)
        at org.apache.cassandra.transport.Server$EventNotifier.getStackTrace(Server.java:396)
        at org.apache.cassandra.transport.Server$EventNotifier.onDown(Server.java:413)
        at org.apache.cassandra.service.StorageService.onDead(StorageService.java:2049)
        at org.apache.cassandra.gms.Gossiper.markDead(Gossiper.java:932)
        at org.apache.cassandra.gms.Gossiper.convict(Gossiper.java:319)
        at org.apache.cassandra.gms.FailureDetector.forceConviction(FailureDetector.java:251)
        at org.apache.cassandra.gms.GossipShutdownVerbHandler.doVerb(GossipShutdownVerbHandler.java:37)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
{code}

and 

{code}
INFO  [GossipStage:1] 2015-03-03 01:11:04,254 Server.java:413 - Thread[GossipStage:1,5,main]
        at java.lang.Thread.getStackTrace(Thread.java:1589)
        at org.apache.cassandra.transport.Server$EventNotifier.getStackTrace(Server.java:396)
        at org.apache.cassandra.transport.Server$EventNotifier.onDown(Server.java:413)
        at org.apache.cassandra.service.StorageService.onDead(StorageService.java:2049)
        at org.apache.cassandra.service.StorageService.onRestart(StorageService.java:2057)
        at org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:958)
        at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1024)
        at org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:58)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
{code}

There are one or more UP notifications, and this is not deterministic but it tends to happen
on the third time the node is restarted. They are generated by the same stack trace but different
threads indicating a contention problem, to be investigated further:

{code}
INFO  [SharedPool-Worker-2] 2015-03-03 01:11:04,419 Gossiper.java:916 - InetAddress /127.0.0.2
is now UP
INFO  [SharedPool-Worker-2] 2015-03-03 01:11:04,421 Server.java:407 - Thread[SharedPool-Worker-2,10,main]
        at java.lang.Thread.getStackTrace(Thread.java:1589)
        at org.apache.cassandra.transport.Server$EventNotifier.getStackTrace(Server.java:396)
        at org.apache.cassandra.transport.Server$EventNotifier.onUp(Server.java:407)
        at org.apache.cassandra.service.StorageService.onAlive(StorageService.java:2033)
        at org.apache.cassandra.gms.Gossiper.realMarkAlive(Gossiper.java:918)
        at org.apache.cassandra.gms.Gossiper.access$900(Gossiper.java:67)
        at org.apache.cassandra.gms.Gossiper$2.response(Gossiper.java:900)
        at org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:54)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
        at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
        at java.lang.Thread.run(Thread.java:745)
{code}

Sample output of the test (with assertions commented out):

{code}
KEEP_LOGS=true PRINT_DEBUG=true nosetests -s -a 'selected' pushed_notifications_test.py
cluster ccm directory: /tmp/dtest-AQzO0X
Restarting second node...
Source 127.0.0.1 sent DOWN for 127.0.0.2
Source 127.0.0.1 sent DOWN for 127.0.0.2
Source 127.0.0.1 sent UP for 127.0.0.2
Waiting for notifications from 127.0.0.1
Restarting second node...
Source 127.0.0.1 sent DOWN for 127.0.0.2
Source 127.0.0.1 sent DOWN for 127.0.0.2
Source 127.0.0.1 sent UP for 127.0.0.2
Waiting for notifications from 127.0.0.1
Restarting second node...
Source 127.0.0.1 sent DOWN for 127.0.0.2
Source 127.0.0.1 sent DOWN for 127.0.0.2
Source 127.0.0.1 sent UP for 127.0.0.2
Source 127.0.0.1 sent UP for 127.0.0.2
Source 127.0.0.1 sent UP for 127.0.0.2
Source 127.0.0.1 sent UP for 127.0.0.2
Source 127.0.0.1 sent UP for 127.0.0.2
Source 127.0.0.1 sent UP for 127.0.0.2
Waiting for notifications from 127.0.0.1
removing ccm cluster test at: /tmp/dtest-AQzO0X
.
----------------------------------------------------------------------
Ran 1 test in 94.861s

OK
{code}

> Updated the "4.2.6. EVENT" section in the binary protocol specification
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-7816
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7816
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Documentation & website
>            Reporter: Michael Penick
>            Assignee: Stefania
>            Priority: Trivial
>         Attachments: tcpdump_repeating_status_change.txt, trunk-7816.txt
>
>
> Added "MOVED_NODE" as a possible type of topology change and also specified that it is
possible to receive the same event multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message