incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Bailey <n...@riptano.com>
Subject Re: Errors when decommissioning - 0.7 RC1
Date Wed, 15 Dec 2010 14:27:26 GMT
This is rc2 I am assuming?

One thing about remove, the removetoken force command is meant to be run on
the node that originally started a remove and doesn't take a token
parameter.  Not relevant to you problem though.

Is this a test cluster and have you tried to reproduce the error? I would be
interested to know what the ring command looks like on both *.19 and *.17
after the decommission is run.  I assume you were running the ring command
on another node?  I'll look into the logs more and see if anything jumps
out.

On Wed, Dec 15, 2010 at 6:37 AM, Dan Hendry <dan.hendry.junk@gmail.com>wrote:

> I am seeing very strange things when trying to decommission a node in my
> cluster (detailed logs attached). Here is a nodetool ring report **after**
> decommissioning of node 192.168.4.19  (as seen by any other, properly
> functioning node).
>
>
>
> 192.168.4.15    Up     Normal  49.9 GB         25.00%
> 42535295865117307932921825928971026431
>
> 192.168.4.20    Up     Normal  42.56 GB        8.33%
> 56713727820156410577229101238628035242
>
> 192.168.4.16    Up     Normal  29.17 GB        16.67%
> 85070591730234615865843651857942052863
>
> 192.168.4.19    Down   Leaving 54.11 GB        16.67%
> 113427455640312821154458202477256070484
>
> 192.168.4.17    Down   Normal  48.88 GB        8.33%
> 127605887595351923798765477786913079295
>
> 192.168.4.18    Up     Normal  59.44 GB        25.00%
> 170141183460469231731687303715884105726
>
> 192.168.4.12    Up     Normal  52.3 GB         0.00%
> 170141183460469231731687303715884105727
>
>
>
>
>
> What I am seeing is that after nodetool decommission completes on
> 192.168.4.19, the next node in the ring (192.168.4.17) ‘dies’ (see attached
> log, its nodetool ring report is quite different). By ‘dies’ I mean that it
> stops communicating with other nodes (but the Cassandra process is still
> running and, among other things, compaction continues). After restarting
> Cassandra on 192.168.4.17, the ring state gets stuck and the decommissioned
> node (192.168.4.19) does not get removed (at least from the nodetool ring
> report):
>
>
>
> 192.168.4.15    Up     Normal  49.9 GB         25.00%
> 42535295865117307932921825928971026431
>
> 192.168.4.20    Up     Normal  42.56 GB        8.33%
> 56713727820156410577229101238628035242
>
> 192.168.4.16    Up     Normal  29.17 GB        16.67%
> 85070591730234615865843651857942052863
>
> 192.168.4.19    Down   Leaving 54.11 GB        16.67%
> 113427455640312821154458202477256070484
>
> 192.168.4.17    Up     Normal  69.12 GB        8.33%
> 127605887595351923798765477786913079295
>
> 192.168.4.18    Up     Normal  58.88 GB        25.00%
> 170141183460469231731687303715884105726
>
> 192.168.4.12    Up     Normal  52.3 GB         0.00%
> 170141183460469231731687303715884105727
>
>
>
>
>
> Furthermore, when I try running “nodetool removetoken
> 113427455640312821154458202477256070484”, I get:
>
>
>
> Exception in thread "main" java.lang.UnsupportedOperationException: Node /
> 192.168.4.19 is already being removed.
>
>                 at
> org.apache.cassandra.service.StorageService.removeToken(StorageService.java:1731)
>
>                 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
>
>                 at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>
>                 at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
>                 at java.lang.reflect.Method.invoke(Method.java:597)
>
>                 at
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>
>
>
>
>
> And when I try running “nodetool removetoken force
> 113427455640312821154458202477256070484”, I get:
>
>
>
> RemovalStatus: No token removals in process.
>
> Exception in thread "main" java.lang.NullPointerException
>
>                 at
> org.apache.cassandra.service.StorageService.forceRemoveCompletion(StorageService.java:1703)
>
>                 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
>
>                 at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>
>                 at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
>                 at java.lang.reflect.Method.invoke(Method.java:597)
>
>                 at
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>
>
>
> ?!?!?!?
>
>
>
> I think have seen this type of behaviour once or twice before (I believe
> 0.7 b1 or later) but wrote it off as being caused by my misguided tinkering
> and/or other Cassandra bugs. This time around, I have done very little with
> JMX/CLI/nodetool and I can find no related Cassandra bugs.
>
>
>
> Help/suggestions?
>
>
>
> Dan Hendry
>
> (403) 660-2297
>
>
>

Mime
View raw message