cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Podkowinski (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-13058) dtest failure in hintedhandoff_test.TestHintedHandoff.hintedhandoff_decom_test
Date Wed, 11 Jan 2017 13:43:58 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15818371#comment-15818371
] 

Stefan Podkowinski commented on CASSANDRA-13058:
------------------------------------------------

The test is passing because hint delivery is not resumable in 3.0 (https://issues.apache.org/jira/browse/CASSANDRA-6230).
This has only been fixed lately in 3.10 (CASSANDRA-11960). 

As due to the missing reply messages addressed in the patch, node2 would never respond, while
handling non-local hints. That will in turn cause all callbacks on node1 to time out and the
HintDispatcher will retry hint delievery. At this point, the FailureDetector is correctly
reporting the node as alive and the dispatch process will not be aborted but simply try to
consume the next hints from a now empty iterator and terminate with a successful return value
afterwards. The log file will contain a "Finished hinted handoff of file [].hints to endpoint
[]", which is technically correct, but is probably a bit misleading in case all writes just
timed out.

Even with the FailureDetector reporting the target node as unavailable and running into the
ABORT case, we'd still get the Exception reported in CASSANDRA-11960. All things considered
chances are high that hints will be lost in 3.0 in case of any errors during delivery.



> dtest failure in hintedhandoff_test.TestHintedHandoff.hintedhandoff_decom_test
> ------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-13058
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13058
>             Project: Cassandra
>          Issue Type: Test
>          Components: Testing
>            Reporter: Sean McCarthy
>            Assignee: Stefan Podkowinski
>            Priority: Blocker
>              Labels: dtest, test-failure
>             Fix For: 3.10
>
>         Attachments: 13058-3.x.patch, node1.log, node1_debug.log, node1_gc.log, node2.log,
node2_debug.log, node2_gc.log, node3.log, node3_debug.log, node3_gc.log, node4.log, node4_debug.log,
node4_gc.log
>
>
> example failure:
> http://cassci.datastax.com/job/cassandra-3.X_novnode_dtest/16/testReport/hintedhandoff_test/TestHintedHandoff/hintedhandoff_decom_test/
> {code}
> Error Message
> Subprocess ['nodetool', '-h', 'localhost', '-p', '7100', ['decommission']] exited with
non-zero status; exit status: 2; 
> stderr: error: Error while decommissioning node: Failed to transfer all hints to 59f20b4f-0215-4e18-be1b-7e00f2901629
> {code}{code}
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
>     testMethod()
>   File "/home/automaton/cassandra-dtest/hintedhandoff_test.py", line 167, in hintedhandoff_decom_test
>     node1.decommission()
>   File "/usr/local/lib/python2.7/dist-packages/ccmlib/node.py", line 1314, in decommission
>     self.nodetool("decommission")
>   File "/usr/local/lib/python2.7/dist-packages/ccmlib/node.py", line 783, in nodetool
>     return handle_external_tool_process(p, ['nodetool', '-h', 'localhost', '-p', str(self.jmx_port),
cmd.split()])
>   File "/usr/local/lib/python2.7/dist-packages/ccmlib/node.py", line 1993, in handle_external_tool_process
>     raise ToolError(cmd_args, rc, out, err)
> {code}{code}
> java.lang.RuntimeException: Error while decommissioning node: Failed to transfer all
hints to 59f20b4f-0215-4e18-be1b-7e00f2901629
> 	at org.apache.cassandra.service.StorageService.decommission(StorageService.java:3924)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:497)
> 	at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:497)
> 	at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
> 	at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
> 	at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
> 	at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
> 	at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
> 	at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
> 	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
> 	at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
> 	at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1466)
> 	at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
> 	at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1307)
> 	at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1399)
> 	at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:828)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:497)
> 	at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:323)
> 	at sun.rmi.transport.Transport$1.run(Transport.java:200)
> 	at sun.rmi.transport.Transport$1.run(Transport.java:197)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
> 	at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
> 	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
> 	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$241(TCPTransport.java:683)
> 	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$284/1694175644.run(Unknown
Source)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message