cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Knighton (JIRA)" <>
Subject [jira] [Created] (CASSANDRA-10068) Batchlog replay fails with exception after a node is decommissioned
Date Fri, 14 Aug 2015 03:09:46 GMT
Joel Knighton created CASSANDRA-10068:

             Summary: Batchlog replay fails with exception after a node is decommissioned
                 Key: CASSANDRA-10068
             Project: Cassandra
          Issue Type: Bug
            Reporter: Joel Knighton
         Attachments: n1.log, n2.log, n3.log, n4.log, n5.log

This issue is reproducible through a Jepsen test of materialized views that crashes and decommissions
nodes throughout the test.

At the conclusion of the test, a batchlog replay is initiated through nodetool and hits the
following assertion due to a missing host ID:

A nodetool status on the node with failed batchlog replay shows the following entry for the
decommissioned node:
DN  ?          256          ?       null                                  rack1

On the unaffected nodes, there is no entry for the decommissioned node as expected.

There are occasional hits of the same assertions for logs in other nodes; it looks like the
issue might occasionally resolve itself, but one node seems to have the errant null entry

In logs for the nodes, this possibly unrelated exception also appears:
java.lang.RuntimeException: Trying to get the view natural endpoint on a non-data replica
	at org.apache.cassandra.db.view.MaterializedViewUtils.getViewNaturalEndpoint(

I have a running cluster with the issue on my machine; it is also repeatable.

Nothing stands out in the logs of the decommissioned node (n4) for me. The logs of each node
in the cluster are attached.

This message was sent by Atlassian JIRA

View raw message