Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id B6D6A200CAF for ; Thu, 22 Jun 2017 23:54:51 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id B56A5160BE7; Thu, 22 Jun 2017 21:54:51 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id AAE41160BD3 for ; Thu, 22 Jun 2017 23:54:49 +0200 (CEST) Received: (qmail 49610 invoked by uid 500); 22 Jun 2017 21:54:47 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 49599 invoked by uid 99); 22 Jun 2017 21:54:47 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Jun 2017 21:54:47 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 98CB7CF3B5 for ; Thu, 22 Jun 2017 21:54:46 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.88 X-Spam-Level: * X-Spam-Status: No, score=1.88 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=cloudera.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id rPl9IyY0y8qL for ; Thu, 22 Jun 2017 21:54:43 +0000 (UTC) Received: from mail-it0-f42.google.com (mail-it0-f42.google.com [209.85.214.42]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id E445F5FBEA for ; Thu, 22 Jun 2017 21:54:42 +0000 (UTC) Received: by mail-it0-f42.google.com with SMTP id m47so27741858iti.0 for ; Thu, 22 Jun 2017 14:54:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudera.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=Z3BRk83JZH774UC5+MjTh0ZMeiRSWKhlildL9XPY6os=; b=AhkcbN4BwZh/SPtps2WtthIMZmq4rf6cf/xbYjlwsH0hAuxnlSMh8GaCf/i/6x1W2B 9mnBiA1/JMiUwI9JQ2OQEpX6p7t5sRTvk3CBPNxEvInfCV/UI/xOXz5k99sALkbnlbC3 ivusCoT78MP/mFxJlabwy2hJ+vjkus/K4ynizwFxlvNvERuj6FItGis6eTpEav2qIY7a suZoDnVzMzsvddp93ZdsfebJ6XmWx7QTl/+CfrEqpXOi401ZUzfDIbyqD7/KkIZmg7dG QAG/eCxlMCpB4CqjVIXUnEiaBM19uJKxRI9iCrFRLdZK2gcpD9BwJtIdQ8aBBB4/j/Ff 9UcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=Z3BRk83JZH774UC5+MjTh0ZMeiRSWKhlildL9XPY6os=; b=Nt1eoa8pwUdoo0D9VnsiZzk7lMZ+mUiFhzxkbN0AKGM2x3B9NLEu7o/eTNMmU2pKc2 PmaUWpqjXGWzPqwEXz7LFG9ZrHm75+oXPPieoPgTI8+FGzCv2tkIn9hO8dOIDV1DpLY5 viKng6MYKmeoQEn4s0W9CzeUTpS/jdyJEgMTFlvBY5fU8AoygFQgVEvnsL2i7UlF2Vyg jWoOWg58eINIW7WQR4xLpxhGDVCjuwPKZRvRv2k1GnuBlN+n66M6b2sQWJ9AZnkn3XDr 0F/Erh0TfR4bWWlSNXl9ZHSH3n+6AM/xkFypQ/AXrQCjbzVFbYQCKkDErcbALbJ9awL2 uALA== X-Gm-Message-State: AKS2vOxZCPVwSr81bhrIDX+Y2V/ZyC3NgdAJLvixOYvuPPDqjIL+VpKe 3oE/99d01ks6h59xX7OTQxRS1UczfmoL X-Received: by 10.36.31.4 with SMTP id d4mr4304574itd.70.1498168481583; Thu, 22 Jun 2017 14:54:41 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.48.76 with HTTP; Thu, 22 Jun 2017 14:54:20 -0700 (PDT) In-Reply-To: References: From: Miklos Szegedi Date: Thu, 22 Jun 2017 14:54:20 -0700 Message-ID: Subject: Re: How to monitor YARN application memory per container? To: Jasson Chenwei Cc: Shmuel Blitz , Naganarasimha Garla , Sunil G , Sidharth Kumar , "common-user@hadoop.apache.org" Content-Type: multipart/alternative; boundary="001a11459a0828ef650552938834" archived-at: Thu, 22 Jun 2017 21:54:51 -0000 --001a11459a0828ef650552938834 Content-Type: text/plain; charset="UTF-8" Hello, MAPREDUCE-6829 was about showing the peak memory usage for mapreduce. Here are some of the new counters: [root@42e243b8cf16 hadoop]# bin/yarn jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-....jar pi 1 1000 Number of Maps = 1 Samples per Map = 1000 ... Peak Map Physical memory (bytes)=274792448 Peak Map Virtual memory (bytes)=2112589824 Peak Reduce Physical memory (bytes)=167776256 Peak Reduce Virtual memory (bytes)=2117087232 ... Estimated value of Pi is 3.14800000000000000000 Thanks, Miklos On Thu, Jun 22, 2017 at 10:21 AM, Jasson Chenwei wrote: > hi, > > Please take a look at Timeline Server 2 which supports aggregate > nodemenager side info into HBase. > These infos include both node level info(e.g., node memory usage, > cpu usage) as well as caontainer(e.g., container memory usage and container > cpu usage ) level info. I am currently trying to set it up and do find > container related infos stored in HBase. > > > Wei Chen > > On Thu, Jun 22, 2017 at 8:12 AM, Shmuel Blitz > wrote: > >> Hi, >> >> Thanks for your response. >> >> We are using CDH, and our version doesn't support the solusions above. >> Also, ATS is not relevant for us now. >> >> We have decided to turn on JMX for all our jobs (spark/hadoop map-reduce) >> and use jmap to collect the data and send it to datadog. >> >> Shmuel >> >> >> >> On Thu, Jun 15, 2017 at 9:39 PM, Naganarasimha Garla < >> naganarasimha_gr@apache.org> wrote: >> >>> Container resource usage has been put into ATS v2 metrics system. But if >>> you do not want heavy ATS v2 subsystem, then i am not sure any of the >>> current interface exposing the actual resource usage of the container which >>> solves your problem. >>> Probably i can think of extending this feature in *ContainerManagementProtocol.getContainerStatuses, >>> *so that atleast AM can be aware of the actual container resource >>> usages. >>> Thoughts ? >>> >>> On Thu, Jun 15, 2017 at 7:29 PM, Sunil G wrote: >>> >>>> And adding to that, we have aggregated container usage per node. I dont >>>> think you ll have a per container real memory usage recorded from YARN. >>>> You ll have these 2 entries in ideal cases. >>>> >>>> Resource Utilization by Node : >>>> Resource Utilization by Containers : PMem:0 MB, VMem:0 MB, VCores:0.0 >>>> >>>> Thanks >>>> Sunil >>>> >>>> On Thu, Jun 15, 2017 at 6:56 AM Sunil G wrote: >>>> >>>>> Hi Shmuel >>>>> >>>>> This feature is available in Hadoop 2.8 + release lines. Or Hadoop 3 >>>>> alpha's. >>>>> >>>>> Thanks >>>>> Sunil >>>>> >>>>> On Wed, Jun 14, 2017 at 6:31 AM Shmuel Blitz < >>>>> shmuel.blitz@similarweb.com> wrote: >>>>> >>>>>> Hi Sunil, >>>>>> >>>>>> Thanks for your response. >>>>>> >>>>>> Here is the response I get when running "yarn node -status {nodeId}" >>>>>> : >>>>>> >>>>>> Node Report : >>>>>> Node-Id : myNode:4545 >>>>>> Rack : /default >>>>>> Node-State : RUNNING >>>>>> Node-Http-Address : muNode:8042 >>>>>> Last-Health-Update : Wed 14/Jun/17 08:25:43:261EST >>>>>> Health-Report : >>>>>> Containers : 7 >>>>>> Memory-Used : 44032MB >>>>>> Memory-Capacity : 49152MB >>>>>> CPU-Used : 16 vcores >>>>>> CPU-Capacity : 48 vcores >>>>>> Node-Labels : >>>>>> >>>>>> However, this is information regarding the entire node, containing >>>>>> all containers. >>>>>> >>>>>> I have no way of using this to see the value I give to ' >>>>>> spark.executor.memory' makes sense or not. >>>>>> >>>>>> I'm looking for memory usage/allocated information *per-container*. >>>>>> >>>>>> Shmuel >>>>>> >>>>>> On Wed, Jun 14, 2017 at 4:04 PM, Sunil G wrote: >>>>>> >>>>>>> Hi Shmuel >>>>>>> >>>>>>> In Hadoop 2.8 release line, you could check "yarn node -status >>>>>>> {nodeId}" CLI command or "http://>>>>>> address:port>/ws/v1/cluster/nodes/{nodeid}" REST end point to get >>>>>>> container's actual resource usage per node. You could also check the same >>>>>>> in any of Hadoop 3.0 alpha releases as well. >>>>>>> >>>>>>> Thanks >>>>>>> Sunil >>>>>>> >>>>>>> On Tue, Jun 13, 2017 at 11:29 PM Shmuel Blitz < >>>>>>> shmuel.blitz@similarweb.com> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Thanks for your response. >>>>>>>> >>>>>>>> The /metrics API returns a blank page on our RM. >>>>>>>> >>>>>>>> The /jmx API has some metrics, but these are the same metrics we >>>>>>>> are already loading into data-dog. >>>>>>>> It's not good enough, because it doesn't break down the memory use >>>>>>>> by container. >>>>>>>> >>>>>>>> I need the by-container breakdown because resource allocation is >>>>>>>> per container and I would like to se if my job is really using up all the >>>>>>>> allocated memory. >>>>>>>> >>>>>>>> Shmuel >>>>>>>> >>>>>>>> On Tue, Jun 13, 2017 at 6:05 PM, Sidharth Kumar < >>>>>>>> sidharthkumar2707@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I guess you can get it from http://:/jmx >>>>>>>>> or /metrics >>>>>>>>> >>>>>>>>> Regards >>>>>>>>> Sidharth >>>>>>>>> LinkedIn: www.linkedin.com/in/sidharthkumar2792 >>>>>>>>> >>>>>>>>> On 13-Jun-2017 6:26 PM, "Shmuel Blitz" < >>>>>>>>> shmuel.blitz@similarweb.com> wrote: >>>>>>>>> >>>>>>>>>> (This question has also been published on StackOveflow >>>>>>>>>> ) >>>>>>>>>> >>>>>>>>>> I am looking for a way to monitor memory usage of YARN containers >>>>>>>>>> over time. >>>>>>>>>> >>>>>>>>>> Specifically - given a YARN application-id, how can you get a >>>>>>>>>> graph, showing the memory usage of each of its containers over time? >>>>>>>>>> >>>>>>>>>> The main goal is to better fit memory allocation requirements for >>>>>>>>>> our YARN applications (Spark / Map-Reduce), to avoid over allocation and >>>>>>>>>> cluster resource waste. A side goal would be the ability to debug memory >>>>>>>>>> issues when developing our jobs and attempting to pick reasonable resource >>>>>>>>>> allocations. >>>>>>>>>> >>>>>>>>>> We've tried using the Data-Dog integration, But it doesn't break >>>>>>>>>> down the metrics by container. >>>>>>>>>> >>>>>>>>>> Another approach was to parse the hadoop-yarn logs. These logs >>>>>>>>>> have messages like: >>>>>>>>>> >>>>>>>>>> Memory usage of ProcessTree 57251 for container-id >>>>>>>>>> container_e116_1495951495692_35134_01_000001: 1.9 GB of 11 GB >>>>>>>>>> physical memory used; 14.4 GB of 23.1 GB virtual memory used >>>>>>>>>> Parsing the logs correctly can yield data that can be used to >>>>>>>>>> plot a graph of memory usage over time. >>>>>>>>>> >>>>>>>>>> That's exactly what we want, but there are two downsides: >>>>>>>>>> >>>>>>>>>> It involves reading human-readable log lines and parsing them >>>>>>>>>> into numeric data. We'd love to avoid that. >>>>>>>>>> If this data can be consumed otherwise, we're hoping it'll have >>>>>>>>>> more information that we might be interest in in the future. We wouldn't >>>>>>>>>> want to put the time into parsing the logs just to realize we need >>>>>>>>>> something else. >>>>>>>>>> Is there any other way to extract these metrics, either by >>>>>>>>>> plugging in to an existing producer or by writing a simple listener? >>>>>>>>>> >>>>>>>>>> Perhaps a whole other approach? >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> [image: Logo] >>>>>>>>>> >>>>>>>>>> Shmuel Blitz >>>>>>>>>> *Big Data Developer* >>>>>>>>>> www.similarweb.com >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Like >>>>>>>>>> Us >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Follow >>>>>>>>>> Us >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Watch >>>>>>>>>> Us >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Read >>>>>>>>>> Us >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> [image: Logo] >>>>>>>> >>>>>>>> Shmuel Blitz >>>>>>>> *Big Data Developer* >>>>>>>> www.similarweb.com >>>>>>>> >>>>>>>> >>>>>>>> Like >>>>>>>> Us >>>>>>>> >>>>>>>> >>>>>>>> Follow >>>>>>>> Us >>>>>>>> >>>>>>>> >>>>>>>> Watch >>>>>>>> Us >>>>>>>> >>>>>>>> >>>>>>>> Read >>>>>>>> Us >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> [image: Logo] >>>>>> >>>>>> Shmuel Blitz >>>>>> *Big Data Developer* >>>>>> www.similarweb.com >>>>>> >>>>>> >>>>>> Like >>>>>> Us >>>>>> >>>>>> >>>>>> Follow >>>>>> Us >>>>>> >>>>>> >>>>>> Watch >>>>>> Us >>>>>> >>>>>> >>>>>> Read >>>>>> Us >>>>>> >>>>>> >>>>> >>> >> >> >> -- >> [image: Logo] >> >> Shmuel Blitz >> *Big Data Developer* >> www.similarweb.com >> >> >> Like >> Us >> >> >> Follow >> Us >> >> >> Watch >> Us >> >> >> Read >> Us >> >> > > --001a11459a0828ef650552938834 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello,

MAPREDUCE-6829 was ab= out showing the peak memory usage for mapreduce.
Here are some of= the new counters:

[root@42e243b8cf16 hadoop]# bin= /yarn jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-....jar pi 1 1= 000

Number of Maps =C2=A0=3D 1

Samples per Map =3D 1000

...

<= /div>
Peak Map Physical memory (bytes)=3D274792448

=
Peak Map Virtual memory (bytes)=3D2112589824

= Peak Reduce Physical memory (bytes)=3D167776256

Pe= ak Reduce Virtual memory (bytes)=3D2117087232

...<= /div>

Estimated value of Pi is 3.14800000000000000000

Thanks,

Miklos
<= div class=3D"gmail_extra">
On Thu, Jun 22, 20= 17 at 10:21 AM, Jasson Chenwei <ynjassionchen@gmail.com> wrote:
hi,=C2=A0
=
Please take a look at Timeline Server 2 which supports aggre= gate nodemenager=C2=A0side info into HBase.=C2=A0
These infos=C2= =A0include both node level info(e.g., node memory usage, cpu=C2=A0usage) as= well as caontainer(e.g., container memory usage and container cpu=C2=A0usa= ge )=C2=A0level info.=C2=A0 I am currently trying to set it up and do find = container related infos=C2=A0stored in HBase.


=
Wei Chen

On Thu, Jun 22, 2017 = at 8:12 AM, Shmuel Blitz <shmuel.blitz@similarweb.com> wrote:
Hi,
Thanks for your response.

= We are using CDH, and our version doesn't support the solusions above.<= /div>
Also, ATS is not relevant for us now.

We= have decided to turn on JMX for all our jobs (spark/hadoop map-reduce) and= use jmap to collect the data and send it to datadog.

Sh= muel



On Thu, Jun 15, 2017 at= 9:39 PM, Naganarasimha Garla <naganarasimha_gr@apache.org&g= t; wrote:
Contain= er resource usage has been put into ATS v2 metrics system. But if you do no= t want heavy ATS v2 subsystem, then i am not sure any of the current interf= ace exposing the actual resource usage of the container which solves your p= roblem.
Probably i can think of extending this feature in ContainerM= anagementProtocol.getContainerStatuses, so that atleast AM can be = aware of the actual container resource usages.=C2=A0
Thoughts ?

On Thu, Jun 15, 2017 at 7:2= 9 PM, Sunil G <sunilg@apache.org> wrote:
And adding to that, we have aggregated cont= ainer usage per node. I dont think you ll have a per container real memory = usage recorded from YARN.
You ll have these 2 entries in ideal cases.

Resource Utilization by Node :=C2=A0
Resource Utilization by Containers : PMem:0 MB, VMem:0 MB, V= Cores:0.0

Thanks
Sunil
<= br>
On Thu, Jun 15, 2017 at 6:56= AM Sunil G <suni= lg@apache.org> wrote:
Hi Shmuel

This feature is available in Hadoop= 2.8=C2=A0+ release lines. Or Hadoop 3 alpha's.=C2=A0

Thanks
Sunil

On Wed, Jun 14, 2017 at 6:31 AM Shmue= l Blitz <shmuel.blitz@similarweb.com> wrote:
Hi Sunil,

Thanks for your res= ponse.

Here is the response I get when running=C2= =A0=C2=A0"yarn node -status {nodeId}"=C2=A0:

Node Report : =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Node-Id : myNode:4545 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Rack : /default =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Node-= State : RUNNING =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0= Node-Http-Address : muNode:8042 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0<= /div>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Last= -Health-Update : Wed 14/Jun/17 08:25:43:261EST =C2=A0 =C2=A0 =C2=A0=C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 Health-Report : =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 Containers : 7 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Memory-Used : 44032MB =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Memory-Capacity : 49152MB= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 CPU-Used : 16 vcores =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 CPU-Capacity : 48 vcores= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Node-Labels : =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0

However, this is information regarding the entire node, containing all c= ontainers.

I have no way of using this = to see the value I give to 'spark.exe= cutor.memory' makes sense or not.

I'm looking for memory u= sage/allocated information per-container.

Shmuel=C2=A0

On Wed, Jun 14, 2017 at 4:04 PM, S= unil G <sunilg@apache.org> wrote:
Hi Shmuel

In Hadoop 2.8 release= line, you could check "yarn node -status {nodeId}" CLI command o= r "http://<rm http address:port>/ws/v1/cluster/nodes/{nodei= d}" REST end point to get container's actual resource usage per no= de. You could also check the same in any of Hadoop 3.0 alpha releases as we= ll.

Thanks
Su= nil

On Tue,= Jun 13, 2017 at 11:29 PM Shmuel Blitz <shmuel.blitz@similarweb.com> wrote:=
Hi,

Thanks for your response.

The /me= trics API returns a blank page on our RM.

The /jmx= API has some metrics, but these are the same metrics we are already loadin= g into data-dog.
It's not good enough, because it doesn't= break down the memory use by container.

I need th= e by-container breakdown because resource allocation is per container and I= would like to se if my job is really using up all the allocated memory.

Shmuel

On Tue, Jun 13, 2017 at 6:0= 5 PM, Sidharth Kumar <sidharthkumar2707@gmail.com>= wrote:
Hi,

I guess you can get it from http://<re= sourcemanager-host>:<rm-port>/jmx or /metrics=C2=A0

Regards

On 13-Jun-2017 6:26 PM, "Shmuel Bli= tz" <shmuel.blitz@similarweb.com> wrote:
(This question has also bee= n published on StackOveflow)

I am looking for = a way to monitor memory usage of YARN containers over time.

<= /div>
Specifically - given a YARN application-id, how can you get a gra= ph, showing the memory usage of each of its containers over time?

The main goal is to better fit memory allocation requiremen= ts for our YARN applications (Spark / Map-Reduce), to avoid over allocation= and cluster resource waste. A side goal would be the ability to debug memo= ry issues when developing our jobs and attempting to pick reasonable resour= ce allocations.

We've tried using the Data-Dog= integration, But it doesn't break down the metrics by container.
=

Another approach was to parse the hadoop-yarn logs. The= se logs have messages like:

Memory usage of Proces= sTree 57251 for container-id container_e116_1495951495692_35134_01_000= 001: 1.9 GB of 11 GB physical memory used; 14.4 GB of 23.1 GB virtual memor= y used
Parsing the logs correctly can yield data that can be used= to plot a graph of memory usage over time.

That&#= 39;s exactly what we want, but there are two downsides:

It involves reading human-readable log lines and parsing them into nu= meric data. We'd love to avoid that.
If this data can be cons= umed otherwise, we're hoping it'll have more information that we mi= ght be interest in in the future. We wouldn't want to put the time into= parsing the logs just to realize we need something else.
Is ther= e any other way to extract these metrics, either by plugging in to an exist= ing producer or by writing a simple listener?

Perh= aps a whole other approach?

--
=
=
=
3D=
= =
= Shmuel Blitz =
Big Data Developer
= www.similarweb.com
Like Us
<= /td>
Follow Us
<= /tbody>
Watch Us
Read Us



--
=
<= tbody>=
=3D"Logo"<= /a>
= Shmuel Blitz =
Big Dat= a Developer
<= /td>
www.similarweb.com
Like Us
Follow Us=
= Watch = Us
Read Us



--
=
3D"Logo" <= /table>



--
=
=
= Shmuel Blitz =
Big Data Developer
www.similarweb.com
Li= ke Us
=
Follow Us
<= td style=3D"font-size:14px;color:#1b2543;font-family:'Open Sans',Ar= ial;vertical-align:middle"> Watch Us
Read Us
=
=
3D"Logo"
= <= /tr>
= Shmuel = Blitz
Big Data Developer
www.similar= web.com
Like Us <= /td>
Follow Us
=
= Watch Us
= Read Us


--001a11459a0828ef650552938834--