Mailing-List: contact user-help@storm.incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@storm.incubator.apache.org
Received-SPF: pass (nike.apache.org: domain of otis.gospodnetic@gmail.com
 designates 209.85.216.50 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAM8u2jYGqLbYT43FC_2yzfk_uLwYQ5vUULh516rZjTxbkKbTjw@mail.gmail.com>
References: 
 <CAM8u2jYGqLbYT43FC_2yzfk_uLwYQ5vUULh516rZjTxbkKbTjw@mail.gmail.com>
Date: Mon, 3 Feb 2014 23:14:27 -0500
Message-ID: 
 <CANNBgP+hfay5zcE7p5JoJdOKysSndv6zRbwBt2Do2gFmxN-XuA@mail.gmail.com>
Subject: Re: Supervisor CPU usage 100%, supervisors restarting in production
From: Otis Gospodnetic <otis.gospodnetic@gmail.com>
To: user@storm.incubator.apache.org
Content-Type: multipart/alternative; boundary=089e014953c850234b04f18ce089

--089e014953c850234b04f18ce089
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hi,

Could it be JVM GC?  Check your GC counts and timings and correlate them
with your other Storm metrics.  If you use something like SPM for Storm you
can send your Storm, JVM, and system metrics graph to the Storm mailing
list directly from SPM.  This may help others help you more easily.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Mon, Feb 3, 2014 at 5:23 AM, Chitra Raveendran <
chitra.raveendran@flutura.com> wrote:

> Hi
>
> I have a storm cluster in production.
>
>     Recently CPU usage by the supervisor machines is hitting 100% during
> weekends, this is kind of weird as we have least traffic on our website
> during weekends. The system gets hung and the supervisord daemon keeps
> trying to restart the storm daemons. Since all the supervisors are being
> affected, the topology is getting hung.
>     Whenever this happens, we loose ssh access to the servers, and have t=
o
> reboot so that the memory gets cleaned up.
>
> There are 4 supervisor machines(VM's) each with 8GB RAM & 4 cores
> And a separate Nimbus machine(8GB RAM, 4 cores).
> There are 12 workers in each node, we currently have around 15 unused
> slots.
>
> Generally the CPU used is around 50-60 percent for these systems and out
> of 8GB only 3-4 GB of RAM is used.
>
> What could be happening?
>
> --
>
> Regards,
>
> *Chitra Raveendran*
> *Data Scientist*
> Mobile: +91 819753660=E2=94=82*Email:* chitra.raveendran@flutura.com
> *Flutura Business Solutions Private Limited =E2=80=93 =E2=80=9CA Decision=
 Sciences &
> Analytics Company=E2=80=9D*=E2=94=82 #693, 2nd Floor, Geetanjali, 15th Cr=
oss, J.P Nagar 2
> nd Phase, Bangalore =E2=80=93 560078=E2=94=82
>
>
>

--089e014953c850234b04f18ce089
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi,<div><br></div><div>Could it be JVM GC? =C2=A0Check you=
r GC counts and timings and correlate them with your other Storm metrics. =
=C2=A0If you use something like SPM for Storm you can send your Storm, JVM,=
 and system metrics graph to the Storm mailing list directly from SPM. =C2=
=A0This may help others help you more easily.</div>
<div class=3D"gmail_extra"><br clear=3D"all"><div>Otis<br>--<br>Performance=
 Monitoring * Log Analytics * Search Analytics<br>Solr &amp; Elasticsearch =
Support * <a href=3D"http://sematext.com/" target=3D"_blank">http://sematex=
t.com/</a></div>

<br><br><div class=3D"gmail_quote">On Mon, Feb 3, 2014 at 5:23 AM, Chitra R=
aveendran <span dir=3D"ltr">&lt;<a href=3D"mailto:chitra.raveendran@flutura=
.com" target=3D"_blank">chitra.raveendran@flutura.com</a>&gt;</span> wrote:=
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><span style=3D"font-family:=
arial,sans-serif;font-size:13px">Hi</span><div style=3D"font-family:arial,s=
ans-serif;font-size:13px">
<br></div><div style=3D"font-family:arial,sans-serif;font-size:13px">I have=
 a storm cluster in production.</div>
<div style=3D"font-family:arial,sans-serif;font-size:13px"><div><br></div><=
div>=C2=A0 =C2=A0 Recently CPU usage by the supervisor machines is hitting =
100% during weekends, this is kind of weird as we have least traffic on our=
 website during weekends. The system gets hung and the supervisord daemon k=
eeps trying to restart the storm daemons. Since all the supervisors are bei=
ng affected, the topology is getting hung.</div>

</div><div style=3D"font-family:arial,sans-serif;font-size:13px">=C2=A0 =C2=
=A0 Whenever this happens, we loose ssh access to the servers, and have to =
reboot so that the memory gets cleaned up.</div><div style=3D"font-family:a=
rial,sans-serif;font-size:13px">

<br></div><div style=3D"font-family:arial,sans-serif;font-size:13px">There =
are 4 supervisor machines(VM&#39;s) each with 8GB RAM &amp; 4 cores=C2=A0<b=
r></div><div style=3D"font-family:arial,sans-serif;font-size:13px">And a se=
parate Nimbus machine(8GB RAM, 4 cores).</div>

<div style=3D"font-family:arial,sans-serif;font-size:13px">There are 12 wor=
kers in each node, we currently have around 15 unused slots.<br></div><div =
style=3D"font-family:arial,sans-serif;font-size:13px"><br></div><div style=
=3D"font-family:arial,sans-serif;font-size:13px">

Generally the CPU used is around 50-60 percent for these systems and out of=
=C2=A08GB only 3-4 GB of RAM is used.</div><div style=3D"font-family:arial,=
sans-serif;font-size:13px"><br></div><div style=3D"font-family:arial,sans-s=
erif;font-size:13px">

What could be happening?=C2=A0</div><span class=3D"HOEnZb"><font color=3D"#=
888888"><div style=3D"font-family:arial,sans-serif;font-size:13px"></div><d=
iv><br></div>-- <br><div dir=3D"ltr"><p><font color=3D"#000000"><span style=
=3D"font-size:10pt"><font face=3D"Trebuchet MS, sans-serif">Regards,</font>=
</span></font></p>

<p><b><font color=3D"#000000"><span style=3D"font-size:10pt"><font face=3D"=
Trebuchet MS, sans-serif">Chitra Raveendran</font></span></font></b><br></p=
><b><font color=3D"#a64d79"><span style=3D"font-size:10pt"><font face=3D"Tr=
ebuchet MS, sans-serif">Data Scientist</font></span></font></b><br>

<span style=3D"color:rgb(166,77,121);font-weight:bold;font-size:10pt;font-f=
amily:&#39;Trebuchet MS&#39;,sans-serif">Mobile: +91 819753660</span><span =
style=3D"color:rgb(166,77,121);font-weight:bold;font-size:10pt;font-family:=
Arial,sans-serif">=E2=94=82</span><span style=3D"color:rgb(166,77,121);font=
-size:10pt;font-family:&#39;Trebuchet MS&#39;,sans-serif"><b>Email:</b>=C2=
=A0<a href=3D"mailto:chitra.raveendran@flutura.com" target=3D"_blank">chitr=
a.raveendran@flutura.com</a></span><br>

<div><font color=3D"#6fa8dc"><b><span style=3D"font-size:10pt;font-family:&=
#39;Trebuchet MS&#39;,sans-serif">Flutura Business Solutions Private Limite=
d =E2=80=93=C2=A0</span><i><span style=3D"font-size:8pt;font-family:&#39;Tr=
ebuchet MS&#39;,sans-serif">=E2=80=9CA Decision Sciences &amp; Analytics Co=
mpany=E2=80=9D</span></i></b><span style=3D"font-size:10pt;font-family:Aria=
l,sans-serif">=E2=94=82</span><span style=3D"font-size:10pt;font-family:=
9;Trebuchet MS&#39;,sans-serif">=C2=A0#693, 2<sup>nd</sup>=C2=A0Floor, Geet=
anjali, 15<sup>th</sup>=C2=A0Cross, J.P Nagar 2<sup>nd</sup>=C2=A0Phase, Ba=
ngalore =E2=80=93 560078</span></font><span style=3D"font-size:10pt;font-fa=
mily:Arial,sans-serif"><font color=3D"#6fa8dc">=E2=94=82</font></span></div=
>

<p></p><p></p><p><br></p></div>
</font></span></div>
</blockquote></div><br></div></div>

--089e014953c850234b04f18ce089--