incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Hendry" <dan.hendry.j...@gmail.com>
Subject Errors when decommissioning - 0.7 RC1
Date Wed, 15 Dec 2010 12:37:13 GMT
I am seeing very strange things when trying to decommission a node in my
cluster (detailed logs attached). Here is a nodetool ring report *after*
decommissioning of node 192.168.4.19  (as seen by any other, properly
functioning node). 

 

192.168.4.15    Up     Normal  49.9 GB         25.00%
42535295865117307932921825928971026431      

192.168.4.20    Up     Normal  42.56 GB        8.33%
56713727820156410577229101238628035242      

192.168.4.16    Up     Normal  29.17 GB        16.67%
85070591730234615865843651857942052863      

192.168.4.19    Down   Leaving 54.11 GB        16.67%
113427455640312821154458202477256070484     

192.168.4.17    Down   Normal  48.88 GB        8.33%
127605887595351923798765477786913079295     

192.168.4.18    Up     Normal  59.44 GB        25.00%
170141183460469231731687303715884105726     

192.168.4.12    Up     Normal  52.3 GB         0.00%
170141183460469231731687303715884105727     

 

 

What I am seeing is that after nodetool decommission completes on
192.168.4.19, the next node in the ring (192.168.4.17) 'dies' (see attached
log, its nodetool ring report is quite different). By 'dies' I mean that it
stops communicating with other nodes (but the Cassandra process is still
running and, among other things, compaction continues). After restarting
Cassandra on 192.168.4.17, the ring state gets stuck and the decommissioned
node (192.168.4.19) does not get removed (at least from the nodetool ring
report):

 

192.168.4.15    Up     Normal  49.9 GB         25.00%
42535295865117307932921825928971026431      

192.168.4.20    Up     Normal  42.56 GB        8.33%
56713727820156410577229101238628035242      

192.168.4.16    Up     Normal  29.17 GB        16.67%
85070591730234615865843651857942052863      

192.168.4.19    Down   Leaving 54.11 GB        16.67%
113427455640312821154458202477256070484     

192.168.4.17    Up     Normal  69.12 GB        8.33%
127605887595351923798765477786913079295     

192.168.4.18    Up     Normal  58.88 GB        25.00%
170141183460469231731687303715884105726     

192.168.4.12    Up     Normal  52.3 GB         0.00%
170141183460469231731687303715884105727    

 

 

Furthermore, when I try running "nodetool removetoken
113427455640312821154458202477256070484", I get: 

 

Exception in thread "main" java.lang.UnsupportedOperationException: Node
/192.168.4.19 is already being removed.

                at
org.apache.cassandra.service.StorageService.removeToken(StorageService.java:
1731)

                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)

                at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
)

                at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25)

                at java.lang.reflect.Method.invoke(Method.java:597)

                at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntr
ospector.java:93)

 

 

And when I try running "nodetool removetoken force
113427455640312821154458202477256070484", I get: 

 

RemovalStatus: No token removals in process.

Exception in thread "main" java.lang.NullPointerException

                at
org.apache.cassandra.service.StorageService.forceRemoveCompletion(StorageSer
vice.java:1703)

                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)

                at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
)

                at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25)

                at java.lang.reflect.Method.invoke(Method.java:597)

                at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntr
ospector.java:93)

 

?!?!?!?

 

I think have seen this type of behaviour once or twice before (I believe 0.7
b1 or later) but wrote it off as being caused by my misguided tinkering
and/or other Cassandra bugs. This time around, I have done very little with
JMX/CLI/nodetool and I can find no related Cassandra bugs.

 

Help/suggestions?

 

Dan Hendry

(403) 660-2297

 


Mime
View raw message