cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Schuller (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-2435) auto bootstrap happened on already bootstrapped nodes
Date Sun, 10 Apr 2011 12:13:05 GMT


Peter Schuller commented on CASSANDRA-2435:

FWIW, looks good to me (but I only did visual inspection and some code jumping in the 0.7
branch; haven't tested it).

> auto bootstrap happened on already bootstrapped nodes
> -----------------------------------------------------
>                 Key: CASSANDRA-2435
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Peter Schuller
>            Assignee: Jonathan Ellis
>            Priority: Critical
>             Fix For: 0.7.5
>         Attachments: 2435.txt
> I believe the following was observed on 0.7.2. I meant to dig deeper, but never had the
time, and now I want to at least file this even if I don't have extremely helpful information.
> A piece of background is that we consciously made the decision to have the default configuration
on nodes have auto_bootstrap set to true. The logic was that if one accidentally were to start
a new node, we'd rather have it join with data than join *without* data and cause bogus read
results in the cluster.
> We executed this policy (by way of having the puppet managed config have auto_bootstrap
set to true).
> On one of our clusters with 5 nodes, we did some moves. All looked well; the moves completed.
For unrelated reasons, we wanted to restart nodes after they had been moved. When we did,
three of the 5, specifically those 3 that were *NOT* seed nodes, initiated a bootstrap procedure!
Before the moves the cluster had been running for several days at least.
> The logs indicated the automatic token selection, and they joined the ring under a new
automatically selected token.
> Presumably, this violated consistency but at the time there was no live traffic to the
cluster and we didn't confirm (put traffic on it after repair+cleanup).
> I did look a little bit at the code in light of this but didn't see anything obvious,
so I don't really know what the likely culprit is.
> A potential complication was that seed nodes were moved without using the correct procedure
of de-seeding them first. This was clearly wrong, but it is not obvious to me that it would
cause other nodes to incorrectly bootstrap since a node should *never* bootstrap more than
once if the local system tables say it's been bootstrapped.

This message is automatically generated by JIRA.
For more information on JIRA, see:

View raw message