cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sam Tunnicliffe (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-10134) Always require replace_address to replace existing address
Date Fri, 15 Apr 2016 14:59:25 GMT


Sam Tunnicliffe commented on CASSANDRA-10134:

One of the MV dtests uncovered a small problem for which I've pushed an additional commit,
and otherwise CI looks good now. 

Building an MV involves writes to the {{system_distributed}} keyspace, which in turn requires
replica info and so can't be done until we've gone through initialization of {{StorageService}}.
In fact, in {{CassandraDaemon}} where build tasks for all views are submitted at startup (to
force completion of any interrupted builds), the comment mentions that SS must be initialized
first. However, the {{Keyspace}} constructor also triggers submission of build tasks for all
of it's views via {{ViewManager::reload}} and this happens prior to SS initialization during
startup. So there's a race at startup between SS initialization and any view build task reaching
a point where it needs to update {{system_distributed}}; the window for this race is widened
here by the mandatory shadow round and so {{MaterializedViewTest.interrupt_build_process_test}}
was failing pretty regularly. The downside of the fix in the patch is that MV builds won't
get submitted while gossip is stopped (via JMX or nodetool) as this marks SS as uninitialized.
This doesn't seem like a particularly big problem to me, but if there are concerns over that
I'm willing to revisit.

> Always require replace_address to replace existing address
> ----------------------------------------------------------
>                 Key: CASSANDRA-10134
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Distributed Metadata
>            Reporter: Tyler Hobbs
>            Assignee: Sam Tunnicliffe
>             Fix For: 3.x
> Normally, when a node is started from a clean state with the same address as an existing
down node, it will fail to start with an error like this:
> {noformat}
> ERROR [main] 2015-08-19 15:07:51,577 - Exception encountered
during startup
> java.lang.RuntimeException: A node with address / already exists, cancelling
join. Use cassandra.replace_address if you want to replace this node.
> 	at org.apache.cassandra.service.StorageService.checkForEndpointCollision(
> 	at org.apache.cassandra.service.StorageService.prepareToJoin(
> 	at org.apache.cassandra.service.StorageService.initServer( ~[main/:na]
> 	at org.apache.cassandra.service.StorageService.initServer( ~[main/:na]
> 	at org.apache.cassandra.service.CassandraDaemon.setup( [main/:na]
> 	at org.apache.cassandra.service.CassandraDaemon.activate( [main/:na]
> 	at org.apache.cassandra.service.CassandraDaemon.main( [main/:na]
> {noformat}
> However, if {{auto_bootstrap}} is set to false or the node is in its own seed list, it
will not throw this error and will start normally.  The new node then takes over the host
ID of the old node (even if the tokens are different), and the only message you will see is
a warning in the other nodes' logs:
> {noformat}
> logger.warn("Changing {}'s host ID from {} to {}", endpoint, storedId, hostId);
> {noformat}
> This could cause an operator to accidentally wipe out the token information for a down
node without replacing it.  To fix this, we should check for an endpoint collision even if
{{auto_bootstrap}} is false or the node is a seed.

This message was sent by Atlassian JIRA

View raw message