cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon Williams (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (CASSANDRA-4373) Gossip can surreptitiously mark a node UP twice without marking it DOWN
Date Tue, 26 Jun 2012 17:50:44 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Brandon Williams resolved CASSANDRA-4373.
-----------------------------------------

    Resolution: Not A Problem

Closing since this is working as intended and any solution would be incorrect and break other
things.  "Don't do that" is the right way to handle this.
                
> Gossip can surreptitiously mark a node UP twice without marking it DOWN
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-4373
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4373
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>             Fix For: 1.1.2
>
>
> As evidenced by dtests:
> {noformat}
>  INFO [GossipStage:1] 2012-06-25 17:19:21,999 Gossiper.java (line 770) Node /127.0.0.2
has restarted, now UP
>  INFO [GossipStage:1] 2012-06-25 17:19:22,000 Gossiper.java (line 738) InetAddress /127.0.0.2
is now UP
>  INFO [GossipStage:1] 2012-06-25 17:19:22,001 StorageService.java (line 1103) Node /127.0.0.2
state jump to normal
>  INFO [GossipStage:1] 2012-06-25 17:19:22,002 Gossiper.java (line 770) Node /127.0.0.3
has restarted, now UP
>  INFO [GossipStage:1] 2012-06-25 17:19:22,004 Gossiper.java (line 738) InetAddress /127.0.0.3
is now UP
>  INFO [GossipStage:1] 2012-06-25 17:19:22,005 StorageService.java (line 1103) Node /127.0.0.3
state jump to normal
>  INFO [RMI TCP Connection(2)-50.57.224.92] 2012-06-25 17:19:24,809 StorageService.java
(line 1933) Starting repair command #1, repairing 3 ranges.
>  INFO [AntiEntropySessions:1] 2012-06-25 17:19:24,818 AntiEntropyService.java (line 620)
[repair #d21b8bd0-bf13-11e1-0000-fe8ebeead9ff] new session: will sync /127.0.0.1, /127.0.0.2,
/127.0.0.3 on range (Token(bytes[00]),Token(bytes[0113427455640312821154458202477256070484])]
for ks.[cf]
>  INFO [AntiEntropySessions:1] 2012-06-25 17:19:24,823 AntiEntropyService.java (line 825)
[repair #d21b8bd0-bf13-11e1-0000-fe8ebeead9ff] requesting merkle trees for cf (to [/127.0.0.2,
/127.0.0.3, /127.0.0.1])
>  INFO [GossipStage:1] 2012-06-25 17:19:24,925 Gossiper.java (line 770) Node /127.0.0.3
has restarted, now UP
>  INFO [GossipStage:1] 2012-06-25 17:19:24,926 Gossiper.java (line 738) InetAddress /127.0.0.3
is now UP
>  INFO [GossipStage:1] 2012-06-25 17:19:24,926 StorageService.java (line 1103) Node /127.0.0.3
state jump to normal
> ERROR [AntiEntropySessions:1] 2012-06-25 17:19:24,927 AntiEntropyService.java (line 670)
[repair #d21b8bd0-bf13-11e1-0000-fe8ebeead9ff] session completed with the following error
> java.io.IOException: Endpoint /127.0.0.3 died
> {noformat}
> It appears that given nodes X, Y, and Z, X sees Z as up via Y even though Z is still
down, but the FD does not ever mark it down.  Later when Z actually does come up, this triggers
another handleMajorStateChange as a restart, which causes an onRestart event, which in turn
fails the repair even though it succeeds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message