cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sam Overton (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-3876) nodetool removetoken force causes an inconsistent state
Date Wed, 08 Feb 2012 12:42:00 GMT
nodetool removetoken force causes an inconsistent state
-------------------------------------------------------

                 Key: CASSANDRA-3876
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3876
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 1.0.7, 1.1
            Reporter: Sam Overton


Steps to reproduce (tested on 1.0.7 and trunk):
* Create a cluster of 3 nodes
* Insert some data
* stop one of the nodes
* Call removetoken on the token of the stopped node
* Immediately after, do removetoken force 
  - this will cause the original removetoken to fail with an error after 30s since the generation
changed for the leaving node, but this is a convenient way of simulating the case where a
removetoken hangs at streaming since the cleanup logic at the end of StorageService.removeToken
is never executed.
  - if you want a more realistic reproduction then get a removetoken to hang in streaming,
then do removetoken force

Effects:
* "removetoken status" now throws an exception because StorageService.removingNode is not
cleared, but the endpoint is no longer a member of the ring:

$ nodetool -h localhost removetoken status
{noformat}
Exception in thread "main" java.lang.AssertionError
	at org.apache.cassandra.locator.TokenMetadata.getToken(TokenMetadata.java:304)
	at org.apache.cassandra.service.StorageService.getRemovalStatus(StorageService.java:2369)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:616)
	at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111)
	at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45)
	at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:226)
	at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:83)
	at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:205)
	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:683)
	at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:672)
	at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
	at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:90)
	at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1285)
	at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1383)
	at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:619)
	at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:616)
	at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
	at sun.rmi.transport.Transport$1.run(Transport.java:177)
	at java.security.AccessController.doPrivileged(Native Method)
	at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
	at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:636)
{noformat}

* truncate no longer works in the cli because the removed endpoint is not removed from Gossiper.unreachableEndpoints.

The cli errors immediately with:
{noformat}
[default@ks1] truncate cf1;
null
UnavailableException()
	at org.apache.cassandra.thrift.Cassandra$truncate_result.read(Cassandra.java:20978)
	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
	at org.apache.cassandra.thrift.Cassandra$Client.recv_truncate(Cassandra.java:942)
	at org.apache.cassandra.thrift.Cassandra$Client.truncate(Cassandra.java:929)
	at org.apache.cassandra.cli.CliClient.executeTruncate(CliClient.java:1417)
	at org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:270)
	at org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.java:219)
	at org.apache.cassandra.cli.CliMain.main(CliMain.java:346)
{noformat}

The logs show:
{noformat}
INFO [Thrift:11] 2012-02-08 11:55:50,135 StorageProxy.java (line 1172) Cannot perform truncate,
some hosts are down
{noformat}

* there are probably other schema related things that fail for the same reason although this
wasn't tested

Workaround:
* Restart the affected node.

Fix:
It looks like StorageService.forceRemoveCompletion is missing some cleanup logic which is
present at the end of StorageService.removeToken. Adding this cleanup logic to forceRemoveCompletion
fixes the above issues (see attached).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message