Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 56668EEDD for ; Sat, 9 Mar 2013 03:59:27 +0000 (UTC) Received: (qmail 39094 invoked by uid 500); 9 Mar 2013 03:59:24 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 38842 invoked by uid 500); 9 Mar 2013 03:59:23 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 38816 invoked by uid 99); 9 Mar 2013 03:59:23 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 09 Mar 2013 03:59:22 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of andrew.bialecki@gmail.com designates 209.85.217.182 as permitted sender) Received: from [209.85.217.182] (HELO mail-lb0-f182.google.com) (209.85.217.182) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 09 Mar 2013 03:59:15 +0000 Received: by mail-lb0-f182.google.com with SMTP id gg6so1807778lbb.41 for ; Fri, 08 Mar 2013 19:58:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=gMYqvkR0+rQQKqu+TazECr9i6HJijWVZKoFGr4HYNsU=; b=W9Qo6VECR2sCW0mQhNONX0FX41bQwYUwGzvYBt4Rflk1XO1rlaIArQOCc+M+rWlSWh CDKva8vw5KxLmYIoqm3TMX7pMUJLsyBCgvYTNVByIMvzadHN/ZVTVukHtoDQES3eZr5V jBL6G5qjAXp52WCR0mSE8r+UAKgNa/Ap7mIEo+0LUX8MFju4c/6cnxWqBRflCOou1M5/ HsGwcsd1NQM9mtjyvIgkXw+vZpJcidG9WB5B+2d5Dp6gUumK6M/4tAVF8EZpzW2hICYt ffbvDBQXyE4nxugzMjkfHeVoilCwO8BMtBw1x9mRjdgi4mXCRlre0H9M4Th0TeW6aObr Dhdw== MIME-Version: 1.0 X-Received: by 10.152.147.36 with SMTP id th4mr4046741lab.19.1362801535141; Fri, 08 Mar 2013 19:58:55 -0800 (PST) Received: by 10.112.14.195 with HTTP; Fri, 8 Mar 2013 19:58:54 -0800 (PST) In-Reply-To: References: Date: Fri, 8 Mar 2013 22:58:54 -0500 Message-ID: Subject: Re: Nodetool drain automatically shutting down node? From: Andrew Bialecki To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=e89a8f22c38166cc8804d775f505 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8f22c38166cc8804d775f505 Content-Type: text/plain; charset=ISO-8859-1 If it's helps, here's the log with debug log statements. Possibly issue with that exception? INFO [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:32,402 StorageService.java (line 774) DRAINING: starting drain process INFO [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:32,403 CassandraDaemon.java (line 218) Stop listening to thrift clients INFO [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:32,404 Gossiper.java (line 1133) Announcing shutdown DEBUG [GossipTasks:1] 2013-03-09 03:54:33,328 DebuggableThreadPoolExecutor.java (line 190) Task cancelled java.util.concurrent.CancellationException at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:220) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.extractThrowable(DebuggableThreadPoolExecutor.java:182) at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.logExceptionsAfterExecute(DebuggableThreadPoolExecutor.java:146) at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor.afterExecute(DebuggableScheduledThreadPoolExecutor.java:50) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,406 StorageService.java (line 776) DRAINING: shutting down MessageService INFO [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,406 MessagingService.java (line 534) Waiting for messaging service to quiesce INFO [ACCEPT-ip-10-116-111-143.ec2.internal/10.116.111.143] 2013-03-09 03:54:33,407 MessagingService.java (line 690) MessagingService shutting down server thread. DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,408 StorageService.java (line 776) DRAINING: waiting for streaming DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,408 StorageService.java (line 776) DRAINING: clearing mutation stage DEBUG [Thread-5] 2013-03-09 03:54:33,408 Gossiper.java (line 221) Reseting version for /10.83.55.44 DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,409 StorageService.java (line 776) DRAINING: flushing column families DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,409 ColumnFamilyStore.java (line 713) forceFlush requested but everything is clean in Counter1 DEBUG [Thread-6] 2013-03-09 03:54:33,410 Gossiper.java (line 221) Reseting version for /10.80.187.124 DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,410 ColumnFamilyStore.java (line 713) forceFlush requested but everything is clean in Super1 DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,410 ColumnFamilyStore.java (line 713) forceFlush requested but everything is clean in SuperCounter1 DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,410 ColumnFamilyStore.java (line 713) forceFlush requested but everything is clean in Standard1 INFO [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,510 StorageService.java (line 774) DRAINED On Fri, Mar 8, 2013 at 10:36 PM, Andrew Bialecki wrote: > Hey all, > > We're getting ready to upgrade our cluster to 1.2.2 from 1.1.5 and we're > testing the upgrade process on our dev cluster. We turned off all client > access to the cluster and then ran "nodetool drain" on the first instance > with the intention of running "nodetool snapshot" once it finished. > However, after running the drain, didn't see any errors, but the Cassandra > process was no longer running. Is that expected? From everything I've read > it doesn't seem like it, but maybe I'm mistaken. > > Here's the relevant portion of the log from that node (notice it says it's > shutting down the server thread in there): > > INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 03:26:48,288 > StorageService.java (line 774) DRAINING: starting drain process > INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 03:26:48,288 > CassandraDaemon.java (line 218) Stop listening to thrift clients > INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 03:26:48,315 > Gossiper.java (line 1133) Announcing shutdown > INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 03:26:49,318 > MessagingService.java (line 534) Waiting for messaging service to quiesce > INFO [ACCEPT-ip-10-116-111-143.ec2.internal/10.116.111.143] 2013-03-09 > 03:26:49,319 MessagingService.java (line 690) MessagingService shutting > down server thread. > INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 03:26:49,338 > ColumnFamilyStore.java (line 659) Enqueuing flush of > Memtable-Counter1@177255852(14810190/60139556 serialized/live bytes, > 243550 ops) > INFO [FlushWriter:7] 2013-03-09 03:26:49,338 Memtable.java (line 264) > Writing Memtable-Counter1@177255852(14810190/60139556 serialized/live > bytes, 243550 ops) > INFO [FlushWriter:7] 2013-03-09 03:26:49,899 Memtable.java (line 305) > Completed flushing > /var/lib/cassandra/data/Keyspace1/Counter1/Keyspace1-Counter1-he-104-Data.db > (15204741 bytes) for commitlog position > ReplayPosition(segmentId=1362797442799, position=27621115) > INFO [CompactionExecutor:11] 2013-03-09 03:26:49,900 CompactionTask.java > (line 109) Compacting > [SSTableReader(path='/var/lib/cassandra/data/Keyspace1/Counter1/Keyspace1-Counter1-he-102-Data.db'), > SSTableReader(path='/var/lib/cassandra/data/Keyspace1/Counter1/Keyspace1-Counter1-he-103-Data.db'), > SSTableReader(path='/var/lib/cassandra/data/Keyspace1/Counter1/Keyspace1-Counter1-he-104-Data.db'), > SSTableReader(path='/var/lib/cassandra/data/Keyspace1/Counter1/Keyspace1-Counter1-he-101-Data.db')] > INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 03:26:50,193 > StorageService.java (line 774) DRAINED > > > Thanks in advanced for any help. > > Cheers, > Andrew > --e89a8f22c38166cc8804d775f505 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable If it's helps, here's the log with debug log statements. Possibly i= ssue with that exception?

INFO [RMI TCP Connection(2)-10.= 116.111.143] 2013-03-09 03:54:32,402 StorageService.java (line 774) DRAININ= G: starting drain process
=A0INFO [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:32,403= CassandraDaemon.java (line 218) Stop listening to thrift clients
=A0INFO [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:32,404 Gos= siper.java (line 1133) Announcing shutdown
DEBUG [GossipTasks:1] 2013-03-09 03:54:33,328 DebuggableThreadPoolExec= utor.java (line 190) Task cancelled
java.util.concurrent.Cancella= tionException
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.j= ava:220)
at ja= va.util.concurrent.FutureTask.get(FutureTask.java:83)
at org.apache.cassand= ra.concurrent.DebuggableThreadPoolExecutor.extractThrowable(DebuggableThrea= dPoolExecutor.java:182)
at or= g.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.logExceptionsAft= erExecute(DebuggableThreadPoolExecutor.java:146)
at org.apache.cassandra.co= ncurrent.DebuggableScheduledThreadPoolExecutor.afterExecute(DebuggableSched= uledThreadPoolExecutor.java:50)
at ja= va.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.jav= a:888)
= at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExec= utor.java:908)
at ja= va.lang.Thread.run(Thread.java:662)
DEBUG [RMI TCP Connection(2)-= 10.116.111.143] 2013-03-09 03:54:33,406 StorageService.java (line 776) DRAI= NING: shutting down MessageService
=A0INFO [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,406= MessagingService.java (line 534) Waiting for messaging service to quiesce<= /div>
=A0INFO [ACCEPT-ip-10-116-111-143.ec2.internal/10.116.111.143] 2013-03-09 03:54:33,407 MessagingServic= e.java (line 690) MessagingService shutting down server thread.
DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,408 S= torageService.java (line 776) DRAINING: waiting for streaming
DEB= UG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,408 StorageSe= rvice.java (line 776) DRAINING: clearing mutation stage
DEBUG [Thread-5] 2013-03-09 03:54:33,408 Gossiper.java (line 221) Rese= ting version for /10.83.55.44
= DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,409 Storag= eService.java (line 776) DRAINING: flushing column families
DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,409 C= olumnFamilyStore.java (line 713) forceFlush requested but everything is cle= an in Counter1
DEBUG [Thread-6] 2013-03-09 03:54:33,410 Gossiper.= java (line 221) Reseting version for /10.8= 0.187.124
DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,410 C= olumnFamilyStore.java (line 713) forceFlush requested but everything is cle= an in Super1
DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03= -09 03:54:33,410 ColumnFamilyStore.java (line 713) forceFlush requested but= everything is clean in SuperCounter1
DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,410 C= olumnFamilyStore.java (line 713) forceFlush requested but everything is cle= an in Standard1
=A0INFO [RMI TCP Connection(2)-10.116.111.143] 20= 13-03-09 03:54:33,510 StorageService.java (line 774) DRAINED

On Fri, Mar 8, 2013 at 10= :36 PM, Andrew Bialecki <andrew.bialecki@gmail.com> = wrote:
Hey all,

We're gettin= g ready to upgrade our cluster to 1.2.2 from 1.1.5 and we're testing th= e upgrade process on our dev cluster. We turned off all client access to th= e cluster and then ran "nodetool drain" on the first instance wit= h the intention of running "nodetool snapshot" once it finished. = However, after running the drain, didn't see any errors, but the Cassan= dra process was no longer running. Is that expected? From everything I'= ve read it doesn't seem like it, but maybe I'm mistaken.

Here's the relevant portion of the log from that no= de (notice it says it's shutting down the server thread in there):

INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 03:26:48,= 288 StorageService.java (line 774) DRAINING: starting drain process
=A0INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 03= :26:48,288 CassandraDaemon.java (line 218) Stop listening to thrift clients=
=A0INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 = 03:26:48,315 Gossiper.java (line 1133) Announcing shutdown
=
=A0INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 03:26:49,31= 8 MessagingService.java (line 534) Waiting for messaging service to quiesce=
=A0INFO [ACCEPT-ip-10-116-111-143.ec2.internal/10.116.111.143] 2013-03-09 03:2= 6:49,319 MessagingService.java (line 690) MessagingService shutting down se= rver thread.
=A0INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 = 03:26:49,338 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-= Counter1@177255852(14810190/60139556 serialized/live bytes, 243550 ops)
=A0INFO [FlushWriter:7] 2013-03-09 03:26:49,338 Memtable.ja= va (line 264) Writing Memtable-Counter1@177255852(14810190/60139556 seriali= zed/live bytes, 243550 ops)
=A0INFO [FlushWriter:7] 20= 13-03-09 03:26:49,899 Memtable.java (line 305) Completed flushing /var/lib/= cassandra/data/Keyspace1/Counter1/Keyspace1-Counter1-he-104-Data.db (152047= 41 bytes) for commitlog position ReplayPosition(segmentId=3D1362797442799, = position=3D27621115)
=A0INFO [CompactionExecutor:11] 2013-03-09 03:26:49,900 Com= pactionTask.java (line 109) Compacting [SSTableReader(path=3D'/var/lib/= cassandra/data/Keyspace1/Counter1/Keyspace1-Counter1-he-102-Data.db'), = SSTableReader(path=3D'/var/lib/cassandra/data/Keyspace1/Counter1/Keyspa= ce1-Counter1-he-103-Data.db'), SSTableReader(path=3D'/var/lib/cassa= ndra/data/Keyspace1/Counter1/Keyspace1-Counter1-he-104-Data.db'), SSTab= leReader(path=3D'/var/lib/cassandra/data/Keyspace1/Counter1/Keyspace1-C= ounter1-he-101-Data.db')]
=A0INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 = 03:26:50,193 StorageService.java (line 774) DRAINED

Thanks in advanced for any help.

Cheers,
Andrew

--e89a8f22c38166cc8804d775f505--