Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 94E79106D4 for ; Tue, 18 Feb 2014 20:30:49 +0000 (UTC) Received: (qmail 59830 invoked by uid 500); 18 Feb 2014 20:30:46 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 59800 invoked by uid 500); 18 Feb 2014 20:30:46 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 59792 invoked by uid 99); 18 Feb 2014 20:30:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Feb 2014 20:30:45 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of edlinuxguru@gmail.com designates 209.85.212.169 as permitted sender) Received: from [209.85.212.169] (HELO mail-wi0-f169.google.com) (209.85.212.169) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Feb 2014 20:30:41 +0000 Received: by mail-wi0-f169.google.com with SMTP id e4so3824296wiv.4 for ; Tue, 18 Feb 2014 12:30:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=ufdhSw7Ti6TshGmwJUheWwTEeWaz2WgCfbYR9D3CBN0=; b=Zc5sNcWatjN6wxGtBwx7KH7aD9WO7JBmkbQXXZI+vqptWXn12Lwaj57nMhDsGV4vvq xk7HQuWyyH58RFQag84nOMpPv2yivojn4Pv5ajhLyXd9NnWIFxfp3o4X0ZEPz6Ve10+u VfQSZ2gRfObAxlumpg96XEfW1u7nuYiIuq6BGtZtg0EVAeUf8iwPm5rlAyhcsKupvUpM CrAsA2dXh4hdnPoZ/QAB/5sY9GbOmoqDrAc1nST0AP0Y5ZlJjd3DNkwAVnyvx4SOHfyR DiLzx7PTxJkvm64rkCxAh7HEeNKkR7q0iuYtSzZrLv38ESBugeBLKKiwqQTonTQPEZt5 0PHg== MIME-Version: 1.0 X-Received: by 10.180.79.73 with SMTP id h9mr20001640wix.3.1392755420116; Tue, 18 Feb 2014 12:30:20 -0800 (PST) Received: by 10.194.220.105 with HTTP; Tue, 18 Feb 2014 12:30:20 -0800 (PST) In-Reply-To: <300d5d4649d748eaa235ce5e0cc8e4e1@SIXPR03MB176.apcprd03.prod.outlook.com> References: <9da66398b0f347f184cb09f10c96c4d8@HKXPR03MB343.apcprd03.prod.outlook.com> <300d5d4649d748eaa235ce5e0cc8e4e1@SIXPR03MB176.apcprd03.prod.outlook.com> Date: Tue, 18 Feb 2014 15:30:20 -0500 Message-ID: Subject: Re: Bootstrap stuck: vnode enabled 1.2.12 From: Edward Capriolo To: "user@cassandra.apache.org" Content-Type: multipart/alternative; boundary=f46d041825e2133e6b04f2b4247b X-Virus-Checked: Checked by ClamAV on apache.org --f46d041825e2133e6b04f2b4247b Content-Type: text/plain; charset=ISO-8859-1 There is a bug where a node without schema can not bootstrap. Do you have schema? On Tue, Feb 18, 2014 at 1:29 PM, Arindam Barua wrote: > > > The node is still out of the ring. Any suggestions on how to get it in > will be very helpful. > > > > *From:* Arindam Barua [mailto:abarua@247-inc.com] > *Sent:* Friday, February 14, 2014 1:04 AM > *To:* user@cassandra.apache.org > *Subject:* Bootstrap stuck: vnode enabled 1.2.12 > > > > > > After our otherwise successful upgrade procedure to enable vnodes, when > adding back "new" hosts to our cluster, one non-seed host ran into a > hardware issue during bootstrap. By the time the hardware issue was fixed a > week later, all other nodes were added successfully, cleaned, repaired. The > disks on this node were untouched, and when the node was started back up, > it detected an interrupted bootstrap, and attempted to bootstrap. However, > after ~24 hrs it was still stuck in the 'JOINING' state according to > nodetool netstats on that node, even though no streams were flowing to/from > it. Also, it did not appear in nodetool status in any way/form (not even as > JOINING). > > > > From couple of observed thread dumps, the stack of the thread blocked > during bootstrap is at [1]. > > > > Since the node wasn't making any progress, I ended up stopping Cassandra, > cleaning up the data and commitlog directories, and attempted a fresh > bootstrap. Nodetool netstats immediately reported a whole bunch of streams > queued up, and data started streaming to the node. The data directory > quickly grew to 18 GB (the other nodes had ~25GB, but we have lot of data > with low TTLs). However, the node ended up being in the earlier reported > state, i.e. nodetool netstats doesn't have anything queued, but still > reports the JOINING state, even though it's been > 24 hrs. There are no > other ERRORS in the logs, and new data being written to the cluster makes > it to this node just fine, triggering compactions, etc from time to time. > > > > Any help is appreciated. > > > > Thanks, > > Arindam > > [1] Thread dump > Thread 3708: (state = BLOCKED) > - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information > may > be imprecise) > - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, > line=156 (Interpreted frame) > - > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() > @bci=1, line=811 (Interpreted frame) > - > > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(int) > @bci=55, line=969 (Interpreted frame) > - > > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(int) > @bci=24, line=1281 (Interpreted frame) > - java.util.concurrent.CountDownLatch.await() @bci=5, line=207 > (Interpreted > frame) > - org.apache.cassandra.dht.RangeStreamer.fetch() @bci=209, line=256 > (Interpreted frame) > - org.apache.cassandra.dht.BootStrapper.bootstrap() @bci=120, line=84 > (Interpreted frame) > - > org.apache.cassandra.service.StorageService.bootstrap(java.util.Collection) > @bci=172, line=978 (Interpreted frame) > - org.apache.cassandra.service.StorageService.joinTokenRing(int) @bci=827, > line=744 (Interpreted frame) > - org.apache.cassandra.service.StorageService.initServer(int) @bci=363, > line=585 (Interpreted frame) > - org.apache.cassandra.service.StorageService.initServer() @bci=4, > line=482 > (Interpreted frame) > - org.apache.cassandra.service.CassandraDaemon.setup() @bci=1069, line=348 > (Interpreted frame) > - org.apache.cassandra.service.CassandraDaemon.activate() @bci=59, > line=447 > (Interpreted frame) > - org.apache.cassandra.service.CassandraDaemon.main(java.lang.String[]) > @bci=3, > line=490 (Interpreted frame) > --f46d041825e2133e6b04f2b4247b Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
There is a bug where a node without schema can not bootstr= ap. Do you have schema?


On Tue, Feb 18, 2014 at 1:29 PM, Arindam Barua <abaru= a@247-inc.com> wrote:

 

The node is still out = of the ring. Any suggestions on how to get it in will be very helpful.

 

From: Arindam = Barua [mailto:abaru= a@247-inc.com]
Sent: Friday, February 14, 2014 1:04 AM
To: u= ser@cassandra.apache.org
Subject: Bootstrap stuck: vnode enabled 1.2.12
<= /p>

 

 

After our otherwise successful upgrade procedure to = enable vnodes, when adding back “new” hosts to our cluster, one= non-seed host ran into a hardware issue during bootstrap. By the time the = hardware issue was fixed a week later, all other nodes were added successfully, cleaned, repaired. The disks on this node w= ere untouched, and when the node was started back up, it detected an interr= upted bootstrap, and attempted to bootstrap. However, after ~24 hrs it was = still stuck in the ‘JOINING’ state according to nodetool netstats on that node, even though no streams were f= lowing to/from it. Also, it did not appear in nodetool status in any way/fo= rm (not even as JOINING).

 

From couple of observed thread dump= s, the stack of the thread blocked during bootstrap is at [1].

 

Since the node wasn’t making = any progress, I ended up stopping Cassandra, cleaning up the data and commi= tlog directories, and attempted a fresh bootstrap. Nodetool netstats immediately reported a whole bunch of streams queued up, and data started = streaming to the node. The data directory quickly grew to 18 GB (the other = nodes had ~25GB, but we have lot of data with low TTLs). However, the node = ended up being in the earlier reported state, i.e. nodetool netstats doesn’t have anything queued, but stil= l reports the JOINING state, even though it’s been > 24 hrs. There= are no other ERRORS in the logs, and new data being written to the cluster= makes it to this node just fine, triggering compactions, etc from time to time.

 

Any help is appreciated.<= /u>

 

Thanks,

Arin= dam

[1] Thread dump
Thread 3708: (state =3D BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=3D0 (Compiled frame; i= nformation may
   be imprecise)
 - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @= bci=3D14,
   line=3D156 (Interpreted frame)
 - java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCh= eckInterrupt()
   @bci=3D1, line=3D811 (Interpreted frame)
 -
   java.util.concurrent.locks.AbstractQueuedSynchronizer.doAc= quireSharedInterruptibly(int)
   @bci=3D55, line=3D969 (Interpreted frame)
 -
   java.util.concurrent.locks.AbstractQueuedSynchronizer.acqu= ireSharedInterruptibly(int)
   @bci=3D24, line=3D1281 (Interpreted frame)
 - java.util.concurrent.CountDownLatch.await() @bci=3D5, line=3D20= 7 (Interpreted
   frame)
 - org.apache.cassandra.dht.RangeStreamer.fetch() @bci=3D209, line= =3D256
   (Interpreted frame)
 - org.apache.cassandra.dht.BootStrapper.bootstrap() @bci=3D120, l= ine=3D84
   (Interpreted frame)
 - org.apache.cassandra.service.StorageService.bootstrap(java.util= .Collection)
   @bci=3D172, line=3D978 (Interpreted frame)
 - org.apache.cassandra.service.StorageService.joinTokenRing(int) = @bci=3D827,
   line=3D744 (Interpreted frame)
 - org.apache.cassandra.service.StorageService.initServer(int) @bc= i=3D363,
   line=3D585 (Interpreted frame)
 - org.apache.cassandra.service.StorageService.initServer() @bci= =3D4, line=3D482
   (Interpreted frame)
 - org.apache.cassandra.service.CassandraDaemon.setup() @bci=3D106= 9, line=3D348
   (Interpreted frame)
 - org.apache.cassandra.service.CassandraDaemon.activate() @bci=3D= 59, line=3D447
   (Interpreted frame)
 - org.apache.cassandra.service.CassandraDaemon.main(java.lang.Str= ing[]) @bci=3D3,
   line=3D490 (Interpreted frame)


--f46d041825e2133e6b04f2b4247b--