incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arindam Barua <aba...@247-inc.com>
Subject RE: Bootstrap stuck: vnode enabled 1.2.12
Date Tue, 18 Feb 2014 18:29:58 GMT

The node is still out of the ring. Any suggestions on how to get it in will be very helpful.

From: Arindam Barua [mailto:abarua@247-inc.com]
Sent: Friday, February 14, 2014 1:04 AM
To: user@cassandra.apache.org
Subject: Bootstrap stuck: vnode enabled 1.2.12


After our otherwise successful upgrade procedure to enable vnodes, when adding back "new"
hosts to our cluster, one non-seed host ran into a hardware issue during bootstrap. By the
time the hardware issue was fixed a week later, all other nodes were added successfully, cleaned,
repaired. The disks on this node were untouched, and when the node was started back up, it
detected an interrupted bootstrap, and attempted to bootstrap. However, after ~24 hrs it was
still stuck in the 'JOINING' state according to nodetool netstats on that node, even though
no streams were flowing to/from it. Also, it did not appear in nodetool status in any way/form
(not even as JOINING).

>From couple of observed thread dumps, the stack of the thread blocked during bootstrap
is at [1].

Since the node wasn't making any progress, I ended up stopping Cassandra, cleaning up the
data and commitlog directories, and attempted a fresh bootstrap. Nodetool netstats immediately
reported a whole bunch of streams queued up, and data started streaming to the node. The data
directory quickly grew to 18 GB (the other nodes had ~25GB, but we have lot of data with low
TTLs). However, the node ended up being in the earlier reported state, i.e. nodetool netstats
doesn't have anything queued, but still reports the JOINING state, even though it's been >
24 hrs. There are no other ERRORS in the logs, and new data being written to the cluster makes
it to this node just fine, triggering compactions, etc from time to time.

Any help is appreciated.

Thanks,
Arindam
[1] Thread dump
Thread 3708: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may
   be imprecise)
 - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14,
   line=156 (Interpreted frame)
 - java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt()
   @bci=1, line=811 (Interpreted frame)
 -
   java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(int)
   @bci=55, line=969 (Interpreted frame)
 -
   java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(int)
   @bci=24, line=1281 (Interpreted frame)
 - java.util.concurrent.CountDownLatch.await() @bci=5, line=207 (Interpreted
   frame)
 - org.apache.cassandra.dht.RangeStreamer.fetch() @bci=209, line=256
   (Interpreted frame)
 - org.apache.cassandra.dht.BootStrapper.bootstrap() @bci=120, line=84
   (Interpreted frame)
 - org.apache.cassandra.service.StorageService.bootstrap(java.util.Collection)
   @bci=172, line=978 (Interpreted frame)
 - org.apache.cassandra.service.StorageService.joinTokenRing(int) @bci=827,
   line=744 (Interpreted frame)
 - org.apache.cassandra.service.StorageService.initServer(int) @bci=363,
   line=585 (Interpreted frame)
 - org.apache.cassandra.service.StorageService.initServer() @bci=4, line=482
   (Interpreted frame)
 - org.apache.cassandra.service.CassandraDaemon.setup() @bci=1069, line=348
   (Interpreted frame)
 - org.apache.cassandra.service.CassandraDaemon.activate() @bci=59, line=447
   (Interpreted frame)
 - org.apache.cassandra.service.CassandraDaemon.main(java.lang.String[]) @bci=3,
   line=490 (Interpreted frame)

Mime
View raw message