cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10231) Null status entries on nodes that crash during decommission of a different node
Date Mon, 12 Oct 2015 02:36:05 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952551#comment-14952551
] 

Stefania commented on CASSANDRA-10231:
--------------------------------------

I agree with your analysis, only 3 methods insert into {{PEERS}}, of these {{updatePeerInfo}}
and {{updateTokens}} do not insert anything if the endpoint is the local broadcast address
whilst the remaining method, {{updatePreferredIP}}, is only called from {{OutboundTcpConnectionPool}}
for remote endpoints. None of the getters expect to find the local ep in {{PEERS}}, the code
in {{SS.initServer()}} seems to confirm this further with the comment at line 614 and by removing
the local ep should it be found (IMO this should have been an assertion but let's leave it).

I've also checked the latest round of CI and it seems inline with the unpatched branch. 

The patch is therefore +1.

Only one tiny nit: I think people _generally_ prefer to drop the parenthesis for {{if}} one
liners but it's not really in the coding standards so it's your choice.

If you are also happy, you can flag this ticket as "READY TO COMMIT" and find a committer
on IRC.

> Null status entries on nodes that crash during decommission of a different node
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10231
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10231
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Joel Knighton
>            Assignee: Joel Knighton
>             Fix For: 3.0.0 rc2
>
>         Attachments: n1.log, n2.log, n3.log, n4.log, n5.log
>
>
> This issue is reproducible through a Jepsen test of materialized views that crashes and
decommissions nodes throughout the test.
> In a 5 node cluster, if a node crashes at a certain point (unknown) during the decommission
of a different node, it may start with a null entry for the decommissioned node like so:
> DN 10.0.0.5 ? 256 ? null rack1
> This entry does not get updated/cleared by gossip. This entry is removed upon a restart
of the affected node.
> This issue is further detailed in ticket [10068|https://issues.apache.org/jira/browse/CASSANDRA-10068].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message