Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 62885200CA7 for ; Wed, 14 Jun 2017 15:31:16 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 60723160BDB; Wed, 14 Jun 2017 13:31:16 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 5A829160BC0 for ; Wed, 14 Jun 2017 15:31:14 +0200 (CEST) Received: (qmail 88826 invoked by uid 500); 14 Jun 2017 13:31:12 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 88810 invoked by uid 99); 14 Jun 2017 13:31:11 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Jun 2017 13:31:11 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 70E701AFCE7 for ; Wed, 14 Jun 2017 13:31:11 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.891 X-Spam-Level: X-Spam-Status: No, score=-0.891 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-2.8, SPF_PASS=-0.001, T_REMOTE_IMAGE=0.01] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=similarweb.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id C4_nJIAb4TR2 for ; Wed, 14 Jun 2017 13:31:07 +0000 (UTC) Received: from mail-oi0-f48.google.com (mail-oi0-f48.google.com [209.85.218.48]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 74C385F6C6 for ; Wed, 14 Jun 2017 13:31:07 +0000 (UTC) Received: by mail-oi0-f48.google.com with SMTP id s3so456291oia.0 for ; Wed, 14 Jun 2017 06:31:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=similarweb.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=ucOtxTKAGR+oECtAu/yniwB6Mt44tF++KDEB0Bj6Pw0=; b=xw6UOchY5pHDP/bJb7F5uKRx64LqIujHgWWqXGURtWfpeIUfbchfRYgJ4pgGzmpl/W OruQwx3XkmZPdVoxzIQ9Wkizm2CpyB6kHN0Ynp8Q9Iu8x7Hvs8ReOQ8+LoYfVfIkM6/l WyBwBB6sG+1taAHGFlqenXICpOUeNUiKbsi1A= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=ucOtxTKAGR+oECtAu/yniwB6Mt44tF++KDEB0Bj6Pw0=; b=ZsLMo/6xtTvVpE0zV0Dj2POMqh+Vc0CPEvcx6HMI+F2HWVpACFlMcGw8fgWOTERzPl QlArRKA4EJ/6l2QanMgb4i06gBmpPw6r3FYRhG8EQ7ygTurQHghpk24/0dlkdO4Fmir2 2BD4T5g191GX/AiLLG36ZKrBRCx28jZbVI+v6zIEBfKFbOfTlbGzexq98/JcYJ+mEbyi lOLh5fke41D9Jurv5dk61aVWuLYr83QHRGv4S4eiK8VIMsic6g09i6HcehhzcWw2e2T8 MnH8g0J8z0hTHkITR7lGpBgLUNNoV+Z1ojrll1FixpNbf1qKm8PRqI62KruA79kyCVwl jkMQ== X-Gm-Message-State: AKS2vOy7xQaSpB/5y91e25O95jjkERQFH5I6IpKb4MZhn5w5MnTJ2RaA 7ien5JTK4tonzySFDAmr5YhnmLbtMsQR X-Received: by 10.202.168.85 with SMTP id r82mr52857oie.91.1497447066634; Wed, 14 Jun 2017 06:31:06 -0700 (PDT) MIME-Version: 1.0 Received: by 10.157.46.34 with HTTP; Wed, 14 Jun 2017 06:31:06 -0700 (PDT) In-Reply-To: References: From: Shmuel Blitz Date: Wed, 14 Jun 2017 16:31:06 +0300 Message-ID: Subject: Re: How to monitor YARN application memory per container? To: Sunil G Cc: Sidharth Kumar , "common-user@hadoop.apache.org" Content-Type: multipart/alternative; boundary="001a113ce34c7a382f0551eb90ba" archived-at: Wed, 14 Jun 2017 13:31:16 -0000 --001a113ce34c7a382f0551eb90ba Content-Type: text/plain; charset="UTF-8" Hi Sunil, Thanks for your response. Here is the response I get when running "yarn node -status {nodeId}" : Node Report : Node-Id : myNode:4545 Rack : /default Node-State : RUNNING Node-Http-Address : muNode:8042 Last-Health-Update : Wed 14/Jun/17 08:25:43:261EST Health-Report : Containers : 7 Memory-Used : 44032MB Memory-Capacity : 49152MB CPU-Used : 16 vcores CPU-Capacity : 48 vcores Node-Labels : However, this is information regarding the entire node, containing all containers. I have no way of using this to see the value I give to ' spark.executor.memory' makes sense or not. I'm looking for memory usage/allocated information *per-container*. Shmuel On Wed, Jun 14, 2017 at 4:04 PM, Sunil G wrote: > Hi Shmuel > > In Hadoop 2.8 release line, you could check "yarn node -status {nodeId}" > CLI command or "http:///ws/v1/cluster/nodes/{nodeid}" > REST end point to get container's actual resource usage per node. You could > also check the same in any of Hadoop 3.0 alpha releases as well. > > Thanks > Sunil > > On Tue, Jun 13, 2017 at 11:29 PM Shmuel Blitz > wrote: > >> Hi, >> >> Thanks for your response. >> >> The /metrics API returns a blank page on our RM. >> >> The /jmx API has some metrics, but these are the same metrics we are >> already loading into data-dog. >> It's not good enough, because it doesn't break down the memory use by >> container. >> >> I need the by-container breakdown because resource allocation is per >> container and I would like to se if my job is really using up all the >> allocated memory. >> >> Shmuel >> >> On Tue, Jun 13, 2017 at 6:05 PM, Sidharth Kumar < >> sidharthkumar2707@gmail.com> wrote: >> >>> Hi, >>> >>> I guess you can get it from http://:/jmx >>> or /metrics >>> >>> Regards >>> Sidharth >>> LinkedIn: www.linkedin.com/in/sidharthkumar2792 >>> >>> On 13-Jun-2017 6:26 PM, "Shmuel Blitz" >>> wrote: >>> >>>> (This question has also been published on StackOveflow >>>> ) >>>> >>>> I am looking for a way to monitor memory usage of YARN containers over >>>> time. >>>> >>>> Specifically - given a YARN application-id, how can you get a graph, >>>> showing the memory usage of each of its containers over time? >>>> >>>> The main goal is to better fit memory allocation requirements for our >>>> YARN applications (Spark / Map-Reduce), to avoid over allocation and >>>> cluster resource waste. A side goal would be the ability to debug memory >>>> issues when developing our jobs and attempting to pick reasonable resource >>>> allocations. >>>> >>>> We've tried using the Data-Dog integration, But it doesn't break down >>>> the metrics by container. >>>> >>>> Another approach was to parse the hadoop-yarn logs. These logs have >>>> messages like: >>>> >>>> Memory usage of ProcessTree 57251 for container-id >>>> container_e116_1495951495692_35134_01_000001: 1.9 GB of 11 GB physical >>>> memory used; 14.4 GB of 23.1 GB virtual memory used >>>> Parsing the logs correctly can yield data that can be used to plot a >>>> graph of memory usage over time. >>>> >>>> That's exactly what we want, but there are two downsides: >>>> >>>> It involves reading human-readable log lines and parsing them into >>>> numeric data. We'd love to avoid that. >>>> If this data can be consumed otherwise, we're hoping it'll have more >>>> information that we might be interest in in the future. We wouldn't want to >>>> put the time into parsing the logs just to realize we need something else. >>>> Is there any other way to extract these metrics, either by plugging in >>>> to an existing producer or by writing a simple listener? >>>> >>>> Perhaps a whole other approach? >>>> >>>> -- >>>> [image: Logo] >>>> >>>> Shmuel Blitz >>>> *Big Data Developer* >>>> www.similarweb.com >>>> >>>> >>>> Like >>>> Us >>>> >>>> >>>> Follow >>>> Us >>>> >>>> >>>> Watch >>>> Us >>>> >>>> >>>> Read >>>> Us >>>> >>>> >>> >> >> >> -- >> [image: Logo] >> >> Shmuel Blitz >> *Big Data Developer* >> www.similarweb.com >> >> >> Like >> Us >> >> >> Follow >> Us >> >> >> Watch >> Us >> >> >> Read >> Us >> >> > -- [image: Logo] Shmuel Blitz *Big Data Developer* www.similarweb.com Like Us Follow Us Watch Us Read Us --001a113ce34c7a382f0551eb90ba Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Sunil,

Thanks for your response.

Here is the response I get when running=C2=A0=C2=A0&quo= t;yarn node -status {nodeId}"= =C2=A0:

=
Node Report : =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 Node-Id : myNode:4545 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Rack : /default =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Node-State : RUN= NING =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0
<= div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 Node-Http-= Address : muNode:8042 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0
<= span style=3D"font-size:12.8px">=C2=A0 =C2=A0 =C2=A0 =C2=A0 Last-Health-Upd= ate : Wed 14/Jun/17 08:25:43:261EST =C2=A0 =C2=A0 =C2=A0=C2=A0
=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Health-Re= port : =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 Containers : 7 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0=C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Memory-Used : 44032MB =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Memory-Capacity : 49152MB =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 CPU-Used : 16 vcores =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 CPU-Capacity : 48 vcores =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Node-Labels : =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0

However= , this is information regarding the entire node, containing all containers.=

<= div>I have no way of using this to see the= value I give to 'spark.executor.memo= ry' makes sense or not.

I'm looking for memory usage/allocat= ed information per-container.

Shmuel= =C2=A0

On Wed, Jun 14, 2017 at 4:04 PM, Sunil G <sunilg@apache.org>= ; wrote:
Hi Shmue= l

In Hadoop 2.8 release line, you could check "yarn= node -status {nodeId}" CLI command or "http://<rm http addres= s:port>/ws/v1/cluster/nodes/{nodeid}" REST end point to get co= ntainer's actual resource usage per node. You could also check the same= in any of Hadoop 3.0 alpha releases as well.

Than= ks
Sunil

On Tue, Jun 13, 2017 at 11:29 PM Shmuel B= litz <s= hmuel.blitz@similarweb.com> wrote:
Hi,

Thanks for your r= esponse.

The /metrics API returns a blank page on = our RM.

The /jmx API has some metrics, but these a= re the same metrics we are already loading into data-dog.
It'= s not good enough, because it doesn't break down the memory use by cont= ainer.

I need the by-container breakdown because r= esource allocation is per container and I would like to se if my job is rea= lly using up all the allocated memory.
Shmuel

On Tue, Jun 13, 2017 at 6:05 PM, Sidharth Kumar <si= dharthkumar2707@gmail.com> wrote:
Hi,

I= guess you can get it from http://<resourcemanager-host>:<rm-= port>/jmx or /metrics=C2=A0

Regards

On 13-Jun-2017 6:26 PM, "Shmue= l Blitz" <shmuel.blitz@similarweb.com> wrote:
=
(This question has als= o been published on StackOveflow)

I am looking = for a way to monitor memory usage of YARN containers over time.
<= br>
Specifically - given a YARN application-id, how can you get a= graph, showing the memory usage of each of its containers over time?
=

The main goal is to better fit memory allocation requir= ements for our YARN applications (Spark / Map-Reduce), to avoid over alloca= tion and cluster resource waste. A side goal would be the ability to debug = memory issues when developing our jobs and attempting to pick reasonable re= source allocations.

We've tried using the Data= -Dog integration, But it doesn't break down the metrics by container.

Another approach was to parse the hadoop-yarn logs.= These logs have messages like:

Memory usage of Pr= ocessTree 57251 for container-id container_e116_1495951495692_35134_01= _000001: 1.9 GB of 11 GB physical memory used; 14.4 GB of 23.1 GB virtual m= emory used
Parsing the logs correctly can yield data that can be = used to plot a graph of memory usage over time.

Th= at's exactly what we want, but there are two downsides:

<= /div>
It involves reading human-readable log lines and parsing them int= o numeric data. We'd love to avoid that.
If this data can be = consumed otherwise, we're hoping it'll have more information that w= e might be interest in in the future. We wouldn't want to put the time = into parsing the logs just to realize we need something else.
Is = there any other way to extract these metrics, either by plugging in to an e= xisting producer or by writing a simple listener?

= Perhaps a whole other approach?

--
<= /tr>
3D"Logo"
= Shmuel Blitz =
Big Data Developer
www.similarweb.com
<= /td> Like Us
Follow Us
=
Watch Us
=
Read Us



--
=
<= /tr> <= /tr>
<= table border=3D"0" cellpadding=3D"0" cellspacing=3D"0" width=3D"470px" styl= e=3D"width:470px;border-bottom:2px">
=
3D"Logo"
<= /tr>
= Shmuel Bl= itz
Big Data Developer
www.similarwe= b.com
=
= Like Us
=
Follow Us
Watch Us
Re= ad Us



--
=
www.similarweb.com
3D"Logo"
= Shmuel Blitz =
Big Data = Developer
=
Like Us
=
F= ollow Us
Watch Us
= Read Us
--001a113ce34c7a382f0551eb90ba--