Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from
	:mime-version:content-type:subject:date:in-reply-to:to
	:references:message-id; q=dns; s=thelastpickle.com; b=OUwt0qOExS
	hsTsU6nEOjbyTT1MbbSwa+3MEOxfjIaKQBrIYl2TM6Wrs4ueY+6NKB9UVl3ViuJo
	ek6Cdm5ULaumD2QUD9m28ZF1Svj5XlgoWg92QoASR44PYu08h8THq3qZ4vIfRB6G
	dvM3NGg5yyNYBGb/3zwUA65FtnWXWIrvk=
From: aaron morton <aaron@thelastpickle.com>
Mime-Version: 1.0 (Apple Message framework v1278)
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_9932D003-D2EE-401E-8004-B9F07AB43A78"
Subject: Re: TimedOutException caused by "Stop the world" activity
Date: Thu, 31 May 2012 11:44:16 +1200
In-Reply-To: 
 <CAFb+LUw8G9exNbCYi5i7qpTCrh=d_-vRkzCNUnRyXbKqDWzFMQ@mail.gmail.com>
To: user@cassandra.apache.org
References: 
 <CAFb+LUw8G9exNbCYi5i7qpTCrh=d_-vRkzCNUnRyXbKqDWzFMQ@mail.gmail.com>
Message-Id: <67E0178B-E58C-4887-B84A-6BFC82C5A641@thelastpickle.com>


--Apple-Mail=_9932D003-D2EE-401E-8004-B9F07AB43A78
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=iso-8859-1

The cluster is running into GC problems and this is slowing it down =
under the stress test. When it slows down one or more of the nodes is =
failing to perform the write within rpc_timeout . This causes the =
coordinator of the write to raise the TimedOutException.=20

You options are:

* allocate more memory
* ease back on the stress test.=20
* work as a CL QUORUM so that one node failing does result in the error.=20=


see also =
http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts

Cheers
=20

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 28/05/2012, at 12:59 PM, Jason Tang wrote:

> Hi
>=20
> My system is 4 nodes 64 bit cassandra cluster, 6G big per node,default =
configuration (which means 1/3 heap for memtable), replicate number 3, =
write all, read one.
> When I run stress load testing, I got this TimedOutException, and some =
operation failed, and all traffic hang for a while.=20
>=20
> And when I have 1G memory 32 bit cassandra on standalone model, I =
didn't find so frequently "Stop the world" behavior.
>=20
> So I wonder what kind of operation will hang the cassandra system.=20
>=20
> How to collect information for tuning.
>=20
> =46rom the system log and document, I guess there are three type =
operations:
> 1) Flush memtable when meet max size
> 2) Compact SSTable (why?)
> 3) Java GC
>=20
> system.log:
>  INFO [main] 2012-05-25 16:12:17,054 ColumnFamilyStore.java (line 688) =
Enqueuing flush of Memtable-LocationInfo@1229893321(53/66 =
serialized/live bytes, 2 ops)
>  INFO [FlushWriter:1] 2012-05-25 16:12:17,054 Memtable.java (line 239) =
Writing Memtable-LocationInfo@1229893321(53/66 serialized/live bytes, 2 =
ops)
>  INFO [FlushWriter:1] 2012-05-25 16:12:17,166 Memtable.java (line 275) =
Completed flushing =
/var/proclog/raw/cassandra/data/system/LocationInfo-hb-2-Data.db (163 =
bytes)
> ...
>=20
>  INFO [CompactionExecutor:441] 2012-05-28 08:02:55,345 =
CompactionTask.java (line 112) Compacting =
[SSTableReader(path=3D'/var/proclog/raw/cassandra/data/myks/queue-hb-41-Da=
ta.db'), SSTableReader(path=3D'/var/proclog/raw/cassandra/data/ myks =
/queue-hb-32-Data.db'), =
SSTableReader(path=3D'/var/proclog/raw/cassandra/data/ myks =
/queue-hb-37-Data.db'), =
SSTableReader(path=3D'/var/proclog/raw/cassandra/data/ myks =
/queue-hb-53-Data.db')]
> ...
>=20
>  WARN [ScheduledTasks:1] 2012-05-28 08:02:26,619 GCInspector.java =
(line 146) Heap is 0.7993011015621736 full.  You may need to reduce =
memtable and/or cache sizes.  Cassandra will now flush up to the two =
largest memtables to free up memory.  Adjust flush_largest_memtables_at =
threshold in cassandra.yaml if you don't want Cassandra to do this =
automatically
>  INFO [ScheduledTasks:1] 2012-05-28 08:02:54,980 GCInspector.java =
(line 123) GC for ConcurrentMarkSweep: 728 ms for 2 collections, =
3594946600 used; max is 6274678784
>  INFO [ScheduledTasks:1] 2012-05-28 08:41:34,030 GCInspector.java =
(line 123) GC for ParNew: 1668 ms for 1 collections, 4171503448 used; =
max is 6274678784
>  INFO [ScheduledTasks:1] 2012-05-28 08:41:48,978 GCInspector.java =
(line 123) GC for ParNew: 1087 ms for 1 collections, 2623067496 used; =
max is 6274678784
>  INFO [ScheduledTasks:1] 2012-05-28 08:41:48,987 GCInspector.java =
(line 123) GC for ConcurrentMarkSweep: 3198 ms for 3 collections, =
2623361280 used; max is 6274678784
>=20
>=20
> Timeout Exception:
> Caused by: org.apache.cassandra.thrift.TimedOutException: null
>         at =
org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.j=
ava:19495) ~[na:na]
>         at =
org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.j=
ava:1035) ~[na:na]
>         at =
org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:1=
009) ~[na:na]
>         at =
me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceSer=
viceImpl.java:95) ~[na:na]
>         ... 64 common frames omitted
>=20
> BRs
> //Tang Weiqiang
>=20
>=20


--Apple-Mail=_9932D003-D2EE-401E-8004-B9F07AB43A78
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=iso-8859-1

<html><head></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">The =
cluster is running into GC problems and this is slowing it down under =
the stress test. When it slows down one or more of the nodes is failing =
to perform the write within rpc_timeout . This causes the coordinator of =
the write to raise the TimedOutException.&nbsp;<div><br></div><div>You =
options are:</div><div><br></div><div>* allocate more memory</div><div>* =
ease back on the stress test.&nbsp;</div><div>* work as a CL QUORUM so =
that one node failing does result in the =
error.&nbsp;</div><div><br></div><div>see also&nbsp;<a =
href=3D"http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_insert=
s">http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts</a>=
</div><div><br></div><div>Cheers</div><div>&nbsp;</div><div><div =
apple-content-edited=3D"true">
</div>
<br><div apple-content-edited=3D"true">
<span class=3D"Apple-style-span" style=3D"border-collapse: separate; =
color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; =
font-variant: normal; font-weight: normal; letter-spacing: normal; =
line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: =
0px; text-transform: none; white-space: normal; widows: 2; word-spacing: =
0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: =
normal; font-weight: normal; letter-spacing: normal; line-height: =
normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: =
normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: =
0px; -webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; =
"><div><div>-----------------</div><div>Aaron Morton</div><div>Freelance =
Developer</div><div>@aaronmorton</div><div><a =
href=3D"http://www.thelastpickle.com">http://www.thelastpickle.com</a></di=
v></div></div></span></div></span></div></span></span>
</div>
<br><div><div>On 28/05/2012, at 12:59 PM, Jason Tang wrote:</div><br =
class=3D"Apple-interchange-newline"><blockquote =
type=3D"cite">Hi<div><br></div><div>My system is 4 nodes 64 bit =
cassandra cluster, 6G big per node,default configuration (which means =
1/3 heap for memtable), replicate number 3, write all, read =
one.</div><div>When I run stress load testing, I got this =
TimedOutException, and some operation failed, and all traffic hang for a =
while.&nbsp;</div>
<div><br></div><div>And when I have 1G memory 32 bit cassandra on =
standalone model, I didn't find so&nbsp;frequently&nbsp;"Stop the world" =
behavior.</div><div><br></div><div>So I wonder what kind of operation =
will hang the cassandra system.&nbsp;
</div><div><br></div><div>How to collect information for =
tuning.</div><div><br></div><div>=46rom the system log and document, =
I&nbsp;guess&nbsp;there are three type operations:</div><div>1) Flush =
memtable when meet max size</div><div>
2) Compact SSTable (why?)</div><div>3) Java =
GC</div><div><br></div><div>system.log:</div><div><div>&nbsp;INFO [main] =
2012-05-25 16:12:17,054 ColumnFamilyStore.java (line 688) Enqueuing =
flush of Memtable-LocationInfo@1229893321(53/66 serialized/live bytes, 2 =
ops)</div>
<div>&nbsp;INFO [FlushWriter:1] 2012-05-25 16:12:17,054 Memtable.java =
(line 239) Writing Memtable-LocationInfo@1229893321(53/66 =
serialized/live bytes, 2 ops)</div><div>&nbsp;INFO [FlushWriter:1] =
2012-05-25 16:12:17,166 Memtable.java (line 275) Completed flushing =
/var/proclog/raw/cassandra/data/system/LocationInfo-hb-2-Data.db (163 =
bytes)</div>
</div><div>...</div><div><br></div><div><div>&nbsp;INFO =
[CompactionExecutor:441] 2012-05-28 08:02:55,345 CompactionTask.java =
(line 112) Compacting =
[SSTableReader(path=3D'/var/proclog/raw/cassandra/data/myks/queue-hb-41-Da=
ta.db'), SSTableReader(path=3D'/var/proclog/raw/cassandra/data/
myks&nbsp;/queue-hb-32-Data.db'), =
SSTableReader(path=3D'/var/proclog/raw/cassandra/data/
myks&nbsp;/queue-hb-37-Data.db'), =
SSTableReader(path=3D'/var/proclog/raw/cassandra/data/
=
myks&nbsp;/queue-hb-53-Data.db')]</div></div><div>...</div><div><br></div>=
<div><div>&nbsp;WARN [ScheduledTasks:1] 2012-05-28 08:02:26,619 =
GCInspector.java (line 146) Heap is 0.7993011015621736 full. &nbsp;You =
may need to reduce memtable and/or cache sizes. &nbsp;Cassandra will now =
flush up to the two largest memtables to free up memory. &nbsp;Adjust =
flush_largest_memtables_at threshold in cassandra.yaml if you don't want =
Cassandra to do this automatically</div>
<div>&nbsp;INFO [ScheduledTasks:1] 2012-05-28 08:02:54,980 =
GCInspector.java (line 123) GC for ConcurrentMarkSweep: 728 ms for 2 =
collections, 3594946600 used; max is 6274678784</div><div>&nbsp;INFO =
[ScheduledTasks:1] 2012-05-28 08:41:34,030 GCInspector.java (line 123) =
GC for ParNew: 1668 ms for 1 collections, 4171503448 used; max is =
6274678784</div>
<div>&nbsp;INFO [ScheduledTasks:1] 2012-05-28 08:41:48,978 =
GCInspector.java (line 123) GC for ParNew: 1087 ms for 1 collections, =
2623067496 used; max is 6274678784</div><div>&nbsp;INFO =
[ScheduledTasks:1] 2012-05-28 08:41:48,987 GCInspector.java (line 123) =
GC for ConcurrentMarkSweep: 3198 ms for 3 collections, 2623361280 used; =
max is 6274678784</div>
</div><div><br></div><div><br></div><div>Timeout =
Exception:</div><div><div>Caused by: =
org.apache.cassandra.thrift.TimedOutException: null</div><div>&nbsp; =
&nbsp; &nbsp; &nbsp; at =
org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.j=
ava:19495) ~[na:na]</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp; at =
org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.j=
ava:1035) ~[na:na]</div><div>&nbsp; &nbsp; &nbsp; &nbsp; at =
org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:1=
009) ~[na:na]</div>
<div>&nbsp; &nbsp; &nbsp; &nbsp; at =
me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceSer=
viceImpl.java:95) ~[na:na]</div><div>&nbsp; &nbsp; &nbsp; &nbsp; ... 64 =
common frames =
omitted</div></div><div><br></div><div>BRs</div><div>//Tang =
Weiqiang</div>
<div><br></div><div><br></div>
</blockquote></div><br></div></body></html>=

--Apple-Mail=_9932D003-D2EE-401E-8004-B9F07AB43A78--