cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sam Overton (Created) (JIRA)" <>
Subject [jira] [Created] (CASSANDRA-3876) nodetool removetoken force causes an inconsistent state
Date Wed, 08 Feb 2012 12:42:00 GMT
nodetool removetoken force causes an inconsistent state

                 Key: CASSANDRA-3876
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 1.0.7, 1.1
            Reporter: Sam Overton

Steps to reproduce (tested on 1.0.7 and trunk):
* Create a cluster of 3 nodes
* Insert some data
* stop one of the nodes
* Call removetoken on the token of the stopped node
* Immediately after, do removetoken force 
  - this will cause the original removetoken to fail with an error after 30s since the generation
changed for the leaving node, but this is a convenient way of simulating the case where a
removetoken hangs at streaming since the cleanup logic at the end of StorageService.removeToken
is never executed.
  - if you want a more realistic reproduction then get a removetoken to hang in streaming,
then do removetoken force

* "removetoken status" now throws an exception because StorageService.removingNode is not
cleared, but the endpoint is no longer a member of the ring:

$ nodetool -h localhost removetoken status
Exception in thread "main" java.lang.AssertionError
	at org.apache.cassandra.locator.TokenMetadata.getToken(
	at org.apache.cassandra.service.StorageService.getRemovalStatus(
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(
	at java.lang.reflect.Method.invoke(
	at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(
	at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(
	at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(
	at com.sun.jmx.mbeanserver.PerInterface.getAttribute(
	at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(
	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(
	at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(
	at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(
	at java.lang.reflect.Method.invoke(
	at sun.rmi.server.UnicastServerRef.dispatch(
	at sun.rmi.transport.Transport$
	at Method)
	at sun.rmi.transport.Transport.serviceCall(
	at sun.rmi.transport.tcp.TCPTransport.handleMessages(
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(
	at sun.rmi.transport.tcp.TCPTransport$
	at java.util.concurrent.ThreadPoolExecutor.runWorker(
	at java.util.concurrent.ThreadPoolExecutor$

* truncate no longer works in the cli because the removed endpoint is not removed from Gossiper.unreachableEndpoints.

The cli errors immediately with:
[default@ks1] truncate cf1;
	at org.apache.cassandra.thrift.Cassandra$
	at org.apache.thrift.TServiceClient.receiveBase(
	at org.apache.cassandra.thrift.Cassandra$Client.recv_truncate(
	at org.apache.cassandra.thrift.Cassandra$Client.truncate(
	at org.apache.cassandra.cli.CliClient.executeTruncate(
	at org.apache.cassandra.cli.CliClient.executeCLIStatement(
	at org.apache.cassandra.cli.CliMain.processStatementInteractive(
	at org.apache.cassandra.cli.CliMain.main(

The logs show:
INFO [Thrift:11] 2012-02-08 11:55:50,135 (line 1172) Cannot perform truncate,
some hosts are down

* there are probably other schema related things that fail for the same reason although this
wasn't tested

* Restart the affected node.

It looks like StorageService.forceRemoveCompletion is missing some cleanup logic which is
present at the end of StorageService.removeToken. Adding this cleanup logic to forceRemoveCompletion
fixes the above issues (see attached).

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message