cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon Williams (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-5254) Nodes can be marked up after gossip sends the goodbye command
Date Tue, 05 Mar 2013 19:42:13 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-5254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Brandon Williams updated CASSANDRA-5254:
----------------------------------------

    Attachment: 5254.txt

This is a pernicious thing to debug, since the timing condition is so tight; enabling DEBUG
or TRACE even on just the gossiper does not let it reproduce.  However, careful examination
of the INFO messages tells us that handleMajorStateChange is not being called since there
is no 'node restarted' message, which means applyStateLocally is the only other option, and
that is called in the ack/ack2 handlers. This tells us that we're in the middle of a gossip
round when we send the shutdown message, so the easiest thing to do is sleep for more than
one round.  Trivial patch to do so, which has solved this on the dtests.
                
> Nodes can be marked up after gossip sends the goodbye command
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-5254
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5254
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.1
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>            Priority: Minor
>         Attachments: 5254.txt
>
>
> Finally tracked this down on dtestbot after setting the rpc_timeout to ridiculous levels:
> {noformat}
> ==> logs/last/node1.log <==
>  INFO [FlushWriter:1] 2013-02-14 10:01:10,311 Memtable.java (line 305) Completed flushing
/tmp/dtest-iaYzzR/test/node1/data/system/schema_columns/system-schema_columns-hf-2-Data.db
(558 bytes) for commitlog position ReplayPosition(segmentId=1360857665931, position=4770)
>  INFO [MemoryMeter:1] 2013-02-14 10:01:10,974 Memtable.java (line 213) CFS(Keyspace='ks',
ColumnFamily='cf') liveRatio is 20.488836662749705 (just-counted was 20.488836662749705).
 calculation took 96ms for 144 columns
>  INFO [GossipStage:1] 2013-02-14 10:01:12,119 Gossiper.java (line 831) InetAddress /127.0.0.3
is now dead.
> ==> logs/last/node2.log <==
>  INFO [GossipStage:1] 2013-02-14 10:01:12,119 Gossiper.java (line 831) InetAddress /127.0.0.3
is now dead.
>  INFO [GossipStage:1] 2013-02-14 10:01:12,238 Gossiper.java (line 817) InetAddress /127.0.0.3
is now UP
>  INFO [GossipTasks:1] 2013-02-14 10:01:26,386 Gossiper.java (line 831) InetAddress /127.0.0.3
is now dead.
> ==> logs/last/node3.log <==
>  INFO [StorageServiceShutdownHook] 2013-02-14 10:01:11,115 Gossiper.java (line 1134)
Announcing shutdown
>  INFO [StorageServiceShutdownHook] 2013-02-14 10:01:12,118 MessagingService.java (line
549) Waiting for messaging service to quiesce
>  INFO [ACCEPT-/127.0.0.3] 2013-02-14 10:01:12,119 MessagingService.java (line 705) MessagingService
shutting down server thread.
> {noformat}
> node2 receives the goodbye command from node3, and node1 has already marked node3 down,
but some kind of signal is still coming from node3 to node2 marking it up again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message