cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Knighton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10231) Null status entries on nodes that crash during decommission of a different node
Date Tue, 29 Sep 2015 06:45:04 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14934733#comment-14934733
] 

Joel Knighton commented on CASSANDRA-10231:
-------------------------------------------

I've attached the logs for n1, n2, n3, n4, and n5. n1 is at 10.0.0.2, n2 is at 10.0.0.3, and
so on.

The decommission node is n2. The node with the null status entry is n5. This status entry
looks like 

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load       Tokens       Owns    Host ID                               Rack
UN  10.0.0.2  480.16 KB  256          ?       7a7681f5-0a22-4ba2-89c4-17c84658a18f  rack1
?N  10.0.0.3  ?          256          ?       null                                  rack1
UN  10.0.0.4  495.24 KB  256          ?       ef529827-e178-49f8-ad3a-458198df5060  rack1
UN  10.0.0.5  374.78 KB  256          ?       ee63423d-1204-496e-b53d-d318472717ab  rack1
UN  10.0.0.6  456.69 KB  256          ?       d88d166b-ed03-4b48-a12e-ea849f680920  rack1

As I mentioned last week, I'm tracking down an MV issue that causes a failure in the tests
before they would reach this point on 3.0. In order to accommodate this, I applied your patch
to commit e5c14285404b1ba98d385c5e5ed069229a2f6004, which is the commit in which I originally
produced the issue.

Sorry for the delay.

> Null status entries on nodes that crash during decommission of a different node
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10231
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10231
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Joel Knighton
>            Assignee: Stefania
>             Fix For: 3.0.0 rc2
>
>         Attachments: n1.log, n2.log, n3.log, n4.log, n5.log
>
>
> This issue is reproducible through a Jepsen test of materialized views that crashes and
decommissions nodes throughout the test.
> In a 5 node cluster, if a node crashes at a certain point (unknown) during the decommission
of a different node, it may start with a null entry for the decommissioned node like so:
> DN 10.0.0.5 ? 256 ? null rack1
> This entry does not get updated/cleared by gossip. This entry is removed upon a restart
of the affected node.
> This issue is further detailed in ticket [10068|https://issues.apache.org/jira/browse/CASSANDRA-10068].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message