Return-Path: X-Original-To: apmail-storm-user-archive@minotaur.apache.org Delivered-To: apmail-storm-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B617F10BB6 for ; Tue, 4 Feb 2014 04:14:57 +0000 (UTC) Received: (qmail 1179 invoked by uid 500); 4 Feb 2014 04:14:57 -0000 Delivered-To: apmail-storm-user-archive@storm.apache.org Received: (qmail 886 invoked by uid 500); 4 Feb 2014 04:14:55 -0000 Mailing-List: contact user-help@storm.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@storm.incubator.apache.org Delivered-To: mailing list user@storm.incubator.apache.org Received: (qmail 878 invoked by uid 99); 4 Feb 2014 04:14:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Feb 2014 04:14:55 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,HTML_OBFUSCATE_05_10,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of otis.gospodnetic@gmail.com designates 209.85.216.50 as permitted sender) Received: from [209.85.216.50] (HELO mail-qa0-f50.google.com) (209.85.216.50) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Feb 2014 04:14:48 +0000 Received: by mail-qa0-f50.google.com with SMTP id cm18so11636837qab.9 for ; Mon, 03 Feb 2014 20:14:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=X5I32oLl+JZdiNMcV8FGBA1Ww/ak50q7gTQXWKLtelo=; b=NJRULFHdW09MKz8iQgPiMs6KA6eAt0sUFRKpSMhxC9qPlw5D/wkuHrbvBE68J35DeL NJHHt8aNbTRRn/+g5W/K4bgqHg77KadW/0VSuT1TIZiYhASOYoLh910hDAg2EszbT9Do Zn0EuyqZHKPOcfHw7RRsWlffmbva85FqXzGjArvRqKVrPalcInaqgocDWYIvJjney3oz 3FV028K2jTQUpPBey90KsD6v96oJQgFTD2gaaZCH92zIdcNXkJZ9nZ4euh6FCUkACPqG wmmQVy8qkURLXelKkb/1Z4ojHQlomGk3QhKIuJ6DmOkFfv5IH07CXTYnv7yaAJtXF2Pp nfig== MIME-Version: 1.0 X-Received: by 10.224.167.19 with SMTP id o19mr63144192qay.77.1391487267905; Mon, 03 Feb 2014 20:14:27 -0800 (PST) Received: by 10.229.97.138 with HTTP; Mon, 3 Feb 2014 20:14:27 -0800 (PST) In-Reply-To: References: Date: Mon, 3 Feb 2014 23:14:27 -0500 Message-ID: Subject: Re: Supervisor CPU usage 100%, supervisors restarting in production From: Otis Gospodnetic To: user@storm.incubator.apache.org Content-Type: multipart/alternative; boundary=089e014953c850234b04f18ce089 X-Virus-Checked: Checked by ClamAV on apache.org --089e014953c850234b04f18ce089 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, Could it be JVM GC? Check your GC counts and timings and correlate them with your other Storm metrics. If you use something like SPM for Storm you can send your Storm, JVM, and system metrics graph to the Storm mailing list directly from SPM. This may help others help you more easily. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ On Mon, Feb 3, 2014 at 5:23 AM, Chitra Raveendran < chitra.raveendran@flutura.com> wrote: > Hi > > I have a storm cluster in production. > > Recently CPU usage by the supervisor machines is hitting 100% during > weekends, this is kind of weird as we have least traffic on our website > during weekends. The system gets hung and the supervisord daemon keeps > trying to restart the storm daemons. Since all the supervisors are being > affected, the topology is getting hung. > Whenever this happens, we loose ssh access to the servers, and have t= o > reboot so that the memory gets cleaned up. > > There are 4 supervisor machines(VM's) each with 8GB RAM & 4 cores > And a separate Nimbus machine(8GB RAM, 4 cores). > There are 12 workers in each node, we currently have around 15 unused > slots. > > Generally the CPU used is around 50-60 percent for these systems and out > of 8GB only 3-4 GB of RAM is used. > > What could be happening? > > -- > > Regards, > > *Chitra Raveendran* > *Data Scientist* > Mobile: +91 819753660=E2=94=82*Email:* chitra.raveendran@flutura.com > *Flutura Business Solutions Private Limited =E2=80=93 =E2=80=9CA Decision= Sciences & > Analytics Company=E2=80=9D*=E2=94=82 #693, 2nd Floor, Geetanjali, 15th Cr= oss, J.P Nagar 2 > nd Phase, Bangalore =E2=80=93 560078=E2=94=82 > > > --089e014953c850234b04f18ce089 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,

Could it be JVM GC? =C2=A0Check you= r GC counts and timings and correlate them with your other Storm metrics. = =C2=A0If you use something like SPM for Storm you can send your Storm, JVM,= and system metrics graph to the Storm mailing list directly from SPM. =C2= =A0This may help others help you more easily.

Otis
--
Performance= Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch = Support * http://sematex= t.com/


On Mon, Feb 3, 2014 at 5:23 AM, Chitra R= aveendran <chitra.raveendran@flutura.com> wrote:=
Hi

I have= a storm cluster in production.

<= div>=C2=A0 =C2=A0 Recently CPU usage by the supervisor machines is hitting = 100% during weekends, this is kind of weird as we have least traffic on our= website during weekends. The system gets hung and the supervisord daemon k= eeps trying to restart the storm daemons. Since all the supervisors are bei= ng affected, the topology is getting hung.
=C2=A0 =C2= =A0 Whenever this happens, we loose ssh access to the servers, and have to = reboot so that the memory gets cleaned up.

There = are 4 supervisor machines(VM's) each with 8GB RAM & 4 cores=C2=A0
And a se= parate Nimbus machine(8GB RAM, 4 cores).
There are 12 wor= kers in each node, we currently have around 15 unused slots.

Generally the CPU used is around 50-60 percent for these systems and out of= =C2=A08GB only 3-4 GB of RAM is used.

What could be happening?=C2=A0

--

Regards,=

Chitra Raveendran
Data Scientist
Mobile: +91 819753660=E2=94=82Email:=C2= =A0chitr= a.raveendran@flutura.com

Flutura Business Solutions Private Limite= d =E2=80=93=C2=A0=E2=80=9CA Decision Sciences & Analytics Co= mpany=E2=80=9D=E2=94=82=C2=A0#693, 2nd=C2=A0Floor, Geet= anjali, 15th=C2=A0Cross, J.P Nagar 2nd=C2=A0Phase, Ba= ngalore =E2=80=93 560078=E2=94=82



--089e014953c850234b04f18ce089--