cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Casey Deccio <ca...@deccio.net>
Subject Re: node stuck "leaving"
Date Tue, 12 Jul 2011 04:51:05 GMT
On Sat, Jul 9, 2011 at 4:47 PM, aaron morton <aaron@thelastpickle.com>wrote:

> Check the log on all the machines for ERROR messages. An error on any of
> the nodes could have caused the streaming to hang. nodetool netstats will
> let you know if there is a failed stream.
>
>
Here's what I see in the logs on the node I'm streaming from:

 INFO 21:31:48,741 Streaming to /x.x.x.2
 WARN 21:34:04,064 MemoryMeter uninitialized (jamm not specified as java
agent); assuming liveRatio of 10.0.  Usually this means cassandra-env.sh
disabled jamm because you are using a buggy JRE; upgrade to the Sun JRE
instead
 WARN 21:34:15,716 MemoryMeter uninitialized (jamm not specified as java
agent); assuming liveRatio of 10.0.  Usually this means cassandra-env.sh
disabled jamm because you are using a buggy JRE; upgrade to the Sun JRE
instead

Here's what I see in the logs on the node that's being streamed to:

 INFO 21:34:15,000 Enqueuing flush of Memtable-MyCF1@409163361(15568/194600
serialized/live bytes, 4 ops)
 INFO 21:34:15,062 Enqueuing flush of Memtable-MyCF2@469885942(23/287
serialized/live bytes, 2 ops)
 INFO 21:34:15,062 Writing Memtable-MyCF1@409163361(15568/194600
serialized/live bytes, 4 ops)
 INFO 21:35:05,063 Enqueuing flush of Memtable-MyCF3@97707952(494145/6176812
serialized/live bytes, 494 ops)
ERROR 21:36:58,886 Fatal exception in thread Thread[Thread-118,5,main]
java.lang.RuntimeException: Cannot recover SSTable with version f (current
version g).
        at
org.apache.cassandra.io.sstable.SSTableWriter.createBuilder(SSTableWriter.java:240)
        at
org.apache.cassandra.db.compaction.CompactionManager.submitSSTableBuild(CompactionManager.java:1092)
        at
org.apache.cassandra.streaming.StreamInSession.finished(StreamInSession.java:110)
        at
org.apache.cassandra.streaming.IncomingStreamReader.readFile(IncomingStreamReader.java:104)
        at
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:61)
        at
org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:162)
        at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:95)


Also, I ran the following on the node I'm decommissioning:

$ nodetool -h localhost netstats x.x.x.2
Mode: Leaving: streaming data to other nodes
Streaming to: /x.x.x.2
   /var/lib/cassandra/data/MyKS/MyCF3-f-5100-Data.db sections=1
progress=26253960206/26253960206 - 100%
   /var/lib/cassandra/data/MyKS/MyCF3-g-23646-Data.db sections=1
progress=0/642731999 - 0%
   /var/lib/cassandra/data/MyKS/MyCF3-g-8614-Data.db sections=1
progress=0/11024712282 - 0%
[...]

This is where it hangs.

AFAIK if you restart the cass service on 1 it will forget it was leaving and
> rejoin in a normal state.
>
>
I've tried restarting, and the node does rejoin the ring, but I get the same
result when I try to decommission again.

Thanks,
Casey

Mime
View raw message