cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon Williams (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-7318) Unable to truncate column family on node which has been decommissioned and re-bootstrapped
Date Tue, 17 Jun 2014 22:39:05 GMT


Brandon Williams updated CASSANDRA-7318:

    Attachment: 7318.txt

The problem here is that there is a dead state for our ip in gossip from the decommission.
 Normally, this isn't a problem since our generation would be newer and knock that state out,
except during bootstrap we do a shadow round to check for an existing endpoint, then fail
to clean unreachable endpoints which is what truncate is checking.  I suspect there would
be a similar problem with replace_address on the same ip.

Patch to also clear unreachableEndpoints and liveEndpoints so that the gossiper is more pristine
when it really start.s

> Unable to truncate column family on node which has been decommissioned and re-bootstrapped
> ------------------------------------------------------------------------------------------
>                 Key: CASSANDRA-7318
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Seen running cassandra 2.0.7 running on Red Hat Linux
>            Reporter: Thomas Whiteway
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 2.0.9
>         Attachments: 7318.txt
> After decommissioning a node, then re-bootstrapping it, it's not possible to truncate
column families until cassandra is restarted.
> Steps to reproduce:
> - Start with a two node deployment (nodes A and B)
> - Run nodetool decommission on node B
> - Stop cassandra on node B
> - Delete the contents of the cassandra data and commitlog directories
> - Start cassandra on node B with node A as the seed
> - Run cqlsh on node B and try to truncate a column family
> - cqlsh displays: "Unable to complete request: one or more nodes were unavailable."
> According to the logs node B seems to think that itself is down.  The follow logs appear
when the server is started and there are no further logs to indicate the B is now UP (A=,
>  INFO [main] 2014-05-29 10:40:11,090 (line 461) Starting Messaging
Service on port 7000
>  INFO [HANDSHAKE-/] 2014-05-29 10:40:11,106 (line
386) Handshaking version with /
>  INFO [GossipStage:1] 2014-05-29 10:40:11,182 (line 903) Node /
is now part of the cluster
>  INFO [GossipStage:1] 2014-05-29 10:40:11,185 (line 883) InetAddress /
is now DOWN
>  INFO [RequestResponseStage:1] 2014-05-29 10:40:11,215 (line 869) InetAddress
/ is now UP
> This problem isn't hit if cassandra is restarted on node A while node B is stopped. 
The problem goes away if node B is restarted.

This message was sent by Atlassian JIRA

View raw message