Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: local policy)
From: aaron morton <aaron@thelastpickle.com>
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_C20EB1A0-DC8F-4113-B486-2AD2C5CD07F0"
Message-Id: <E3E32F15-1E0C-494C-9F32-E507748B4FEC@thelastpickle.com>
Mime-Version: 1.0 (Mac OS X Mail 6.0 \(1485\))
Subject: Re: Why the StageManager thread pools have 60 seconds keepalive time?
Date: Sun, 19 Aug 2012 20:21:09 +1200
References: 
 <CAKKFZSoXXQiztRbOtjXACBfuOz75gVtfWkNWx0RtF3N4M1Cgkg@mail.gmail.com>
 <E1AF4B5E-9EEF-497A-B870-DCC6704911F8@thelastpickle.com>
 <CAKKFZSocu9srnjHTZUmr0v_CMs=5Joc6Vna9qE_z2HyDs7i46g@mail.gmail.com>
To: user@cassandra.apache.org
In-Reply-To: 
 <CAKKFZSocu9srnjHTZUmr0v_CMs=5Joc6Vna9qE_z2HyDs7i46g@mail.gmail.com>


--Apple-Mail=_C20EB1A0-DC8F-4113-B486-2AD2C5CD07F0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=iso-8859-1

Your seeing dropped mutations reported from nodetool tpstats ?=20

Take a look at the logs. Look for messages from the MessagingService =
with the pattern "{} {} messages dropped in last {}ms" They will be =
followed by info about the TP stats.

First would be the workload. Are you sending very big batch_mutate or =
multiget requests? Each row in the requests turns into a command in the =
appropriate thread pool. This can result in other requests waiting a =
long time for their commands to get processed.=20

Next would be looking for GC and checking the memtable_flush_queue_size =
is set high enough (check yaml for docs).=20

After that I would look at winding  concurrent_writers (and I assume =
concurrent_readers) back. Anytime I see weirdness I look for config =
changes and see what happens when they are returned to the default or =
near default.  Do you have 16 _physical_ cores?

Hope that helps.=20
 =20
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 18/08/2012, at 10:01 AM, Guillermo Winkler <gwinkler@inconcertcc.com> =
wrote:

> Aaron, thanks for your answer.
>=20
> I'm actually tracking a problem where mutations get dropped and =
cfstats show no activity whatsoever, I have 100 threads for the mutation =
pool, no running or pending tasks, but some mutations get dropped none =
the less.
>=20
> I'm thinking about some scheduling problems but not really sure yet.
>=20
> Have you ever seen a case of dropped mutations with the system under =
light load?
>=20
> Thanks,
> Guille
>=20
>=20
> On Thu, Aug 16, 2012 at 8:22 PM, aaron morton =
<aaron@thelastpickle.com> wrote:
> That's some pretty old code. I would guess it was done that way to =
conserve resources. And _i think_ thread creation is pretty light =
weight.
>=20
> Jonathan / Brandon / others - opinions ?=20
>=20
> Cheers
>=20
>=20
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>=20
> On 17/08/2012, at 8:09 AM, Guillermo Winkler =
<gwinkler@inconcertcc.com> wrote:
>=20
>> Hi, I have a cassandra cluster where I'm seeing a lot of thread =
trashing from the mutation pool.
>>=20
>> MutationStage:72031
>>=20
>> Where threads get created and disposed in 100's batches every few =
minutes, since it's a 16 core server concurrent_writes is set in 100 in =
the cassandra.yaml.=20
>>=20
>> concurrent_writes: 100
>>=20
>> I've seen in the StageManager class this pools get created with 60 =
seconds keepalive time.
>>=20
>> DebuggableThreadPoolExecutor -> allowCoreThreadTimeOut(true);
>>=20
>> StageManager-> public static final long KEEPALIVE =3D 60; // seconds =
to keep "extra" threads alive for when idle
>>=20
>> Is it a reason for it to be this way?=20
>>=20
>> Why not have a fixed size pool with Integer.MAX_VALUE as keepalive =
since corePoolSize and maxPoolSize are set at the same size?
>>=20
>> Thanks,
>> Guille
>>=20
>=20
>=20


--Apple-Mail=_C20EB1A0-DC8F-4113-B486-2AD2C5CD07F0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=iso-8859-1

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Diso-8859-1"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Your =
seeing dropped mutations reported from nodetool tpstats =
?&nbsp;<div><br></div><div>Take a look at the logs. Look for messages =
from the MessagingService with the pattern "{} {} messages dropped in =
last {}ms" They will be followed by info about the TP =
stats.</div><div><br></div><div>First would be the workload. Are you =
sending very big batch_mutate or multiget requests? Each row in the =
requests turns into a command in the appropriate thread pool. This can =
result in other requests waiting a long time for their commands to get =
processed.&nbsp;</div><div><br></div><div>Next would be looking for GC =
and checking the memtable_flush_queue_size is set high enough (check =
yaml for docs).&nbsp;</div><div><br></div><div>After that I would look =
at winding &nbsp;concurrent_writers (and I assume concurrent_readers) =
back. Anytime I see weirdness I look for config changes and see what =
happens when they are returned to the default or near default. &nbsp;Do =
you have 16 _physical_ cores?</div><div><br></div><div>Hope that =
helps.&nbsp;</div><div>&nbsp;&nbsp;<br><div apple-content-edited=3D"true">=

<span class=3D"Apple-style-span" style=3D"border-collapse: separate; =
color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; =
font-variant: normal; font-weight: normal; letter-spacing: normal; =
line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: =
0px; text-transform: none; white-space: normal; widows: 2; word-spacing: =
0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: =
normal; font-weight: normal; letter-spacing: normal; line-height: =
normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: =
normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: =
0px; -webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; =
"><div><div>-----------------</div><div>Aaron Morton</div><div>Freelance =
Developer</div><div>@aaronmorton</div><div><a =
href=3D"http://www.thelastpickle.com">http://www.thelastpickle.com</a></di=
v></div></div></span></div></span></div></span></span>
</div>

<br><div><div>On 18/08/2012, at 10:01 AM, Guillermo Winkler &lt;<a =
href=3D"mailto:gwinkler@inconcertcc.com">gwinkler@inconcertcc.com</a>&gt; =
wrote:</div><br class=3D"Apple-interchange-newline"><blockquote =
type=3D"cite">Aaron, thanks for your answer.<div><br></div><div>I'm =
actually tracking a problem where mutations get dropped and cfstats show =
no activity whatsoever, I have 100 threads for the mutation pool, no =
running or pending tasks, but some mutations get dropped none the =
less.</div>

<div><br></div><div>I'm thinking about some scheduling problems but not =
really sure yet.</div><div><br></div><div>Have you ever seen a case of =
dropped mutations with the system under light =
load?</div><div><br></div><div>

Thanks,</div><div>Guille</div><div><br><br><div class=3D"gmail_quote">On =
Thu, Aug 16, 2012 at 8:22 PM, aaron morton <span dir=3D"ltr">&lt;<a =
href=3D"mailto:aaron@thelastpickle.com" =
target=3D"_blank">aaron@thelastpickle.com</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex"><div =
style=3D"word-wrap:break-word">That's some pretty old code. I would =
guess it was done that way to conserve resources. And _i think_ thread =
creation is pretty light weight.<br>

<div><div><span =
style=3D"border-collapse:separate;border-spacing:0px"><span =
style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;font-st=
yle:normal;font-weight:normal;line-height:normal;border-collapse:separate;=
text-transform:none;font-size:medium;white-space:normal;font-family:Helvet=
ica;word-spacing:0px"><div style=3D"word-wrap:break-word">

<span =
style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;font-st=
yle:normal;font-weight:normal;line-height:normal;border-collapse:separate;=
text-transform:none;font-size:medium;white-space:normal;font-family:Helvet=
ica;word-spacing:0px"><div style=3D"word-wrap:break-word">

<span =
style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;font-st=
yle:normal;font-weight:normal;line-height:normal;border-collapse:separate;=
text-transform:none;font-size:medium;white-space:normal;font-family:Helvet=
ica;word-spacing:0px"><div style=3D"word-wrap:break-word">

<span =
style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;font-st=
yle:normal;font-weight:normal;line-height:normal;border-collapse:separate;=
text-transform:none;font-size:medium;white-space:normal;font-family:Helvet=
ica;word-spacing:0px"><div style=3D"word-wrap:break-word">

<div><br></div><div>Jonathan / Brandon / others - opinions =
?&nbsp;</div><div><br></div><div>Cheers</div><div><br></div></div></span><=
/div></span></div></span></div></span></span>
</div>
<br><div>
<span =
style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;text-al=
ign:-webkit-auto;font-style:normal;font-weight:normal;line-height:normal;b=
order-collapse:separate;text-transform:none;font-size:medium;white-space:n=
ormal;font-family:Helvetica;word-spacing:0px">

<div style=3D"word-wrap:break-word">

<span =
style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;font-st=
yle:normal;font-weight:normal;line-height:normal;border-collapse:separate;=
text-transform:none;font-size:medium;white-space:normal;font-family:Helvet=
ica;word-spacing:0px"><div style=3D"word-wrap:break-word">

<div><div>-----------------</div><div>Aaron Morton</div><div>Freelance =
Developer</div><div>@aaronmorton</div><div><a =
href=3D"http://www.thelastpickle.com/" =
target=3D"_blank">http://www.thelastpickle.com</a></div></div></div></span=
></div>

</span>
</div><div><div class=3D"h5">
<br><div><div>On 17/08/2012, at 8:09 AM, Guillermo Winkler &lt;<a =
href=3D"mailto:gwinkler@inconcertcc.com" =
target=3D"_blank">gwinkler@inconcertcc.com</a>&gt; =
wrote:</div><br><blockquote type=3D"cite">Hi, I have a cassandra cluster =
where I'm seeing a lot of thread trashing from the mutation pool.<br>

<br>MutationStage:72031<br><br>Where threads get created and disposed in =
100's batches every few minutes, since it's a 16 core server =
concurrent_writes is set in 100 in the cassandra.yaml. <br>

<br>concurrent_writes: 100<br><br>I've seen in the StageManager class =
this pools get created with 60 seconds keepalive =
time.<br><br>DebuggableThreadPoolExecutor =
-&gt;&nbsp;allowCoreThreadTimeOut(true);<br><br>StageManager-&gt;&nbsp;pub=
lic static final long KEEPALIVE =3D 60; // seconds to keep "extra" =
threads alive for when idle<br>


<br>Is it a reason for it to be this way?&nbsp;<div><br></div><div>Why =
not have a fixed size pool with Integer.MAX_VALUE as keepalive since =
corePoolSize and maxPoolSize are set at the same =
size?<br><br><div>Thanks,</div><div>


Guille</div>
<div><br></div></div>
=
</blockquote></div><br></div></div></div></div></blockquote></div><br></di=
v>
</blockquote></div><br></div></body></html>=

--Apple-Mail=_C20EB1A0-DC8F-4113-B486-2AD2C5CD07F0--