Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of andrew.bialecki@gmail.com
 designates 209.85.217.182 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAFDWQMTrYm7hBxXKoW8+eVKfNE6zvjW2h8_BSVGmOL7=gRDtLw@mail.gmail.com>
References: 
 <CAFDWQMTrYm7hBxXKoW8+eVKfNE6zvjW2h8_BSVGmOL7=gRDtLw@mail.gmail.com>
Date: Fri, 8 Mar 2013 22:58:54 -0500
Message-ID: 
 <CAFDWQMSz+bgycbwu300i-7Zrs5CrHNYunqGvTOPiD98J28-UOg@mail.gmail.com>
Subject: Re: Nodetool drain automatically shutting down node?
From: Andrew Bialecki <andrew.bialecki@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=e89a8f22c38166cc8804d775f505

--e89a8f22c38166cc8804d775f505
Content-Type: text/plain; charset=ISO-8859-1

If it's helps, here's the log with debug log statements. Possibly issue
with that exception?

INFO [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:32,402
StorageService.java (line 774) DRAINING: starting drain process
 INFO [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:32,403
CassandraDaemon.java (line 218) Stop listening to thrift clients
 INFO [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:32,404
Gossiper.java (line 1133) Announcing shutdown
DEBUG [GossipTasks:1] 2013-03-09 03:54:33,328
DebuggableThreadPoolExecutor.java (line 190) Task cancelled
java.util.concurrent.CancellationException
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:220)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.extractThrowable(DebuggableThreadPoolExecutor.java:182)
at
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.logExceptionsAfterExecute(DebuggableThreadPoolExecutor.java:146)
at
org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor.afterExecute(DebuggableScheduledThreadPoolExecutor.java:50)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,406
StorageService.java (line 776) DRAINING: shutting down MessageService
 INFO [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,406
MessagingService.java (line 534) Waiting for messaging service to quiesce
 INFO [ACCEPT-ip-10-116-111-143.ec2.internal/10.116.111.143] 2013-03-09
03:54:33,407 MessagingService.java (line 690) MessagingService shutting
down server thread.
DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,408
StorageService.java (line 776) DRAINING: waiting for streaming
DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,408
StorageService.java (line 776) DRAINING: clearing mutation stage
DEBUG [Thread-5] 2013-03-09 03:54:33,408 Gossiper.java (line 221) Reseting
version for /10.83.55.44
DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,409
StorageService.java (line 776) DRAINING: flushing column families
DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,409
ColumnFamilyStore.java (line 713) forceFlush requested but everything is
clean in Counter1
DEBUG [Thread-6] 2013-03-09 03:54:33,410 Gossiper.java (line 221) Reseting
version for /10.80.187.124
DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,410
ColumnFamilyStore.java (line 713) forceFlush requested but everything is
clean in Super1
DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,410
ColumnFamilyStore.java (line 713) forceFlush requested but everything is
clean in SuperCounter1
DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,410
ColumnFamilyStore.java (line 713) forceFlush requested but everything is
clean in Standard1
 INFO [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,510
StorageService.java (line 774) DRAINED


On Fri, Mar 8, 2013 at 10:36 PM, Andrew Bialecki
<andrew.bialecki@gmail.com>wrote:

> Hey all,
>
> We're getting ready to upgrade our cluster to 1.2.2 from 1.1.5 and we're
> testing the upgrade process on our dev cluster. We turned off all client
> access to the cluster and then ran "nodetool drain" on the first instance
> with the intention of running "nodetool snapshot" once it finished.
> However, after running the drain, didn't see any errors, but the Cassandra
> process was no longer running. Is that expected? From everything I've read
> it doesn't seem like it, but maybe I'm mistaken.
>
> Here's the relevant portion of the log from that node (notice it says it's
> shutting down the server thread in there):
>
> INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 03:26:48,288
> StorageService.java (line 774) DRAINING: starting drain process
>  INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 03:26:48,288
> CassandraDaemon.java (line 218) Stop listening to thrift clients
>  INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 03:26:48,315
> Gossiper.java (line 1133) Announcing shutdown
>  INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 03:26:49,318
> MessagingService.java (line 534) Waiting for messaging service to quiesce
>  INFO [ACCEPT-ip-10-116-111-143.ec2.internal/10.116.111.143] 2013-03-09
> 03:26:49,319 MessagingService.java (line 690) MessagingService shutting
> down server thread.
>  INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 03:26:49,338
> ColumnFamilyStore.java (line 659) Enqueuing flush of
> Memtable-Counter1@177255852(14810190/60139556 serialized/live bytes,
> 243550 ops)
>  INFO [FlushWriter:7] 2013-03-09 03:26:49,338 Memtable.java (line 264)
> Writing Memtable-Counter1@177255852(14810190/60139556 serialized/live
> bytes, 243550 ops)
>  INFO [FlushWriter:7] 2013-03-09 03:26:49,899 Memtable.java (line 305)
> Completed flushing
> /var/lib/cassandra/data/Keyspace1/Counter1/Keyspace1-Counter1-he-104-Data.db
> (15204741 bytes) for commitlog position
> ReplayPosition(segmentId=1362797442799, position=27621115)
>  INFO [CompactionExecutor:11] 2013-03-09 03:26:49,900 CompactionTask.java
> (line 109) Compacting
> [SSTableReader(path='/var/lib/cassandra/data/Keyspace1/Counter1/Keyspace1-Counter1-he-102-Data.db'),
> SSTableReader(path='/var/lib/cassandra/data/Keyspace1/Counter1/Keyspace1-Counter1-he-103-Data.db'),
> SSTableReader(path='/var/lib/cassandra/data/Keyspace1/Counter1/Keyspace1-Counter1-he-104-Data.db'),
> SSTableReader(path='/var/lib/cassandra/data/Keyspace1/Counter1/Keyspace1-Counter1-he-101-Data.db')]
>  INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 03:26:50,193
> StorageService.java (line 774) DRAINED
>
>
> Thanks in advanced for any help.
>
> Cheers,
> Andrew
>

--e89a8f22c38166cc8804d775f505
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

If it&#39;s helps, here&#39;s the log with debug log statements. Possibly i=
ssue with that exception?<div><br></div><div><blockquote style=3D"margin:0 =
0 0 40px;border:none;padding:0px"><div><div>INFO [RMI TCP Connection(2)-10.=
116.111.143] 2013-03-09 03:54:32,402 StorageService.java (line 774) DRAININ=
G: starting drain process</div>
<div>=A0INFO [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:32,403=
 CassandraDaemon.java (line 218) Stop listening to thrift clients</div><div=
>=A0INFO [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:32,404 Gos=
siper.java (line 1133) Announcing shutdown</div>
<div>DEBUG [GossipTasks:1] 2013-03-09 03:54:33,328 DebuggableThreadPoolExec=
utor.java (line 190) Task cancelled</div><div>java.util.concurrent.Cancella=
tionException</div><div><span class=3D"Apple-tab-span" style=3D"white-space=
:pre">	</span>at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.j=
ava:220)</div>
<div><span class=3D"Apple-tab-span" style=3D"white-space:pre">	</span>at ja=
va.util.concurrent.FutureTask.get(FutureTask.java:83)</div><div><span class=
=3D"Apple-tab-span" style=3D"white-space:pre">	</span>at org.apache.cassand=
ra.concurrent.DebuggableThreadPoolExecutor.extractThrowable(DebuggableThrea=
dPoolExecutor.java:182)</div>
<div><span class=3D"Apple-tab-span" style=3D"white-space:pre">	</span>at or=
g.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.logExceptionsAft=
erExecute(DebuggableThreadPoolExecutor.java:146)</div><div><span class=3D"A=
pple-tab-span" style=3D"white-space:pre">	</span>at org.apache.cassandra.co=
ncurrent.DebuggableScheduledThreadPoolExecutor.afterExecute(DebuggableSched=
uledThreadPoolExecutor.java:50)</div>
<div><span class=3D"Apple-tab-span" style=3D"white-space:pre">	</span>at ja=
va.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.jav=
a:888)</div><div><span class=3D"Apple-tab-span" style=3D"white-space:pre">	=
</span>at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExec=
utor.java:908)</div>
<div><span class=3D"Apple-tab-span" style=3D"white-space:pre">	</span>at ja=
va.lang.Thread.run(Thread.java:662)</div><div>DEBUG [RMI TCP Connection(2)-=
10.116.111.143] 2013-03-09 03:54:33,406 StorageService.java (line 776) DRAI=
NING: shutting down MessageService</div>
<div>=A0INFO [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,406=
 MessagingService.java (line 534) Waiting for messaging service to quiesce<=
/div><div>=A0INFO [ACCEPT-ip-10-116-111-143.ec2.internal/<a href=3D"http://=
10.116.111.143">10.116.111.143</a>] 2013-03-09 03:54:33,407 MessagingServic=
e.java (line 690) MessagingService shutting down server thread.</div>
<div>DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,408 S=
torageService.java (line 776) DRAINING: waiting for streaming</div><div>DEB=
UG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,408 StorageSe=
rvice.java (line 776) DRAINING: clearing mutation stage</div>
<div>DEBUG [Thread-5] 2013-03-09 03:54:33,408 Gossiper.java (line 221) Rese=
ting version for /<a href=3D"http://10.83.55.44">10.83.55.44</a></div><div>=
DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,409 Storag=
eService.java (line 776) DRAINING: flushing column families</div>
<div>DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,409 C=
olumnFamilyStore.java (line 713) forceFlush requested but everything is cle=
an in Counter1</div><div>DEBUG [Thread-6] 2013-03-09 03:54:33,410 Gossiper.=
java (line 221) Reseting version for /<a href=3D"http://10.80.187.124">10.8=
0.187.124</a></div>
<div>DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,410 C=
olumnFamilyStore.java (line 713) forceFlush requested but everything is cle=
an in Super1</div><div>DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03=
-09 03:54:33,410 ColumnFamilyStore.java (line 713) forceFlush requested but=
 everything is clean in SuperCounter1</div>
<div>DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,410 C=
olumnFamilyStore.java (line 713) forceFlush requested but everything is cle=
an in Standard1</div><div>=A0INFO [RMI TCP Connection(2)-10.116.111.143] 20=
13-03-09 03:54:33,510 StorageService.java (line 774) DRAINED</div>
</div></blockquote><br><div class=3D"gmail_quote">On Fri, Mar 8, 2013 at 10=
:36 PM, Andrew Bialecki <span dir=3D"ltr">&lt;<a href=3D"mailto:andrew.bial=
ecki@gmail.com" target=3D"_blank">andrew.bialecki@gmail.com</a>&gt;</span> =
wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Hey all,<div><br></div><div>We&#39;re gettin=
g ready to upgrade our cluster to 1.2.2 from 1.1.5 and we&#39;re testing th=
e upgrade process on our dev cluster. We turned off all client access to th=
e cluster and then ran &quot;nodetool drain&quot; on the first instance wit=
h the intention of running &quot;nodetool snapshot&quot; once it finished. =
However, after running the drain, didn&#39;t see any errors, but the Cassan=
dra process was no longer running. Is that expected? From everything I&#39;=
ve read it doesn&#39;t seem like it, but maybe I&#39;m mistaken.</div>

<div><br></div><div>Here&#39;s the relevant portion of the log from that no=
de (notice it says it&#39;s shutting down the server thread in there):</div=
><div><br></div><blockquote style=3D"margin:0 0 0 40px;border:none;padding:=
0px">

<div><div>INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 03:26:48,=
288 StorageService.java (line 774) DRAINING: starting drain process</div></=
div><div><div>=A0INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 03=
:26:48,288 CassandraDaemon.java (line 218) Stop listening to thrift clients=
</div>

</div><div><div>=A0INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 =
03:26:48,315 Gossiper.java (line 1133) Announcing shutdown</div></div><div>=
<div>=A0INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 03:26:49,31=
8 MessagingService.java (line 534) Waiting for messaging service to quiesce=
</div>

</div><div><div>=A0INFO [ACCEPT-ip-10-116-111-143.ec2.internal/<a href=3D"h=
ttp://10.116.111.143" target=3D"_blank">10.116.111.143</a>] 2013-03-09 03:2=
6:49,319 MessagingService.java (line 690) MessagingService shutting down se=
rver thread.</div>

</div><div><div>=A0INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 =
03:26:49,338 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-=
Counter1@177255852(14810190/60139556 serialized/live bytes, 243550 ops)</di=
v>

</div><div><div>=A0INFO [FlushWriter:7] 2013-03-09 03:26:49,338 Memtable.ja=
va (line 264) Writing Memtable-Counter1@177255852(14810190/60139556 seriali=
zed/live bytes, 243550 ops)</div></div><div><div>=A0INFO [FlushWriter:7] 20=
13-03-09 03:26:49,899 Memtable.java (line 305) Completed flushing /var/lib/=
cassandra/data/Keyspace1/Counter1/Keyspace1-Counter1-he-104-Data.db (152047=
41 bytes) for commitlog position ReplayPosition(segmentId=3D1362797442799, =
position=3D27621115)</div>

</div><div><div>=A0INFO [CompactionExecutor:11] 2013-03-09 03:26:49,900 Com=
pactionTask.java (line 109) Compacting [SSTableReader(path=3D&#39;/var/lib/=
cassandra/data/Keyspace1/Counter1/Keyspace1-Counter1-he-102-Data.db&#39;), =
SSTableReader(path=3D&#39;/var/lib/cassandra/data/Keyspace1/Counter1/Keyspa=
ce1-Counter1-he-103-Data.db&#39;), SSTableReader(path=3D&#39;/var/lib/cassa=
ndra/data/Keyspace1/Counter1/Keyspace1-Counter1-he-104-Data.db&#39;), SSTab=
leReader(path=3D&#39;/var/lib/cassandra/data/Keyspace1/Counter1/Keyspace1-C=
ounter1-he-101-Data.db&#39;)]</div>

</div><div><div>=A0INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 =
03:26:50,193 StorageService.java (line 774) DRAINED</div></div></blockquote=
><div><br></div><div>Thanks in advanced for any help.</div><div><br></div>

<div>Cheers,</div><div>Andrew</div>
</blockquote></div><br></div>

--e89a8f22c38166cc8804d775f505--