hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Miklos Szegedi <miklos.szeg...@cloudera.com>
Subject Re: How to monitor YARN application memory per container?
Date Thu, 22 Jun 2017 21:54:20 GMT
Hello,

MAPREDUCE-6829 was about showing the peak memory usage for mapreduce.
Here are some of the new counters:

[root@42e243b8cf16 hadoop]# bin/yarn jar
./share/hadoop/mapreduce/hadoop-mapreduce-examples-....jar pi 1 1000

Number of Maps  = 1

Samples per Map = 1000

...

Peak Map Physical memory (bytes)=274792448

Peak Map Virtual memory (bytes)=2112589824

Peak Reduce Physical memory (bytes)=167776256

Peak Reduce Virtual memory (bytes)=2117087232

...

Estimated value of Pi is 3.14800000000000000000

Thanks,

Miklos

On Thu, Jun 22, 2017 at 10:21 AM, Jasson Chenwei <ynjassionchen@gmail.com>
wrote:

> hi,
>
> Please take a look at Timeline Server 2 which supports aggregate
> nodemenager side info into HBase.
> These infos include both node level info(e.g., node memory usage,
> cpu usage) as well as caontainer(e.g., container memory usage and container
> cpu usage ) level info.  I am currently trying to set it up and do find
> container related infos stored in HBase.
>
>
> Wei Chen
>
> On Thu, Jun 22, 2017 at 8:12 AM, Shmuel Blitz <shmuel.blitz@similarweb.com
> > wrote:
>
>> Hi,
>>
>> Thanks for your response.
>>
>> We are using CDH, and our version doesn't support the solusions above.
>> Also, ATS is not relevant for us now.
>>
>> We have decided to turn on JMX for all our jobs (spark/hadoop map-reduce)
>> and use jmap to collect the data and send it to datadog.
>>
>> Shmuel
>>
>>
>>
>> On Thu, Jun 15, 2017 at 9:39 PM, Naganarasimha Garla <
>> naganarasimha_gr@apache.org> wrote:
>>
>>> Container resource usage has been put into ATS v2 metrics system. But if
>>> you do not want heavy ATS v2 subsystem, then i am not sure any of the
>>> current interface exposing the actual resource usage of the container which
>>> solves your problem.
>>> Probably i can think of extending this feature in *ContainerManagementProtocol.getContainerStatuses,
>>> *so that atleast AM can be aware of the actual container resource
>>> usages.
>>> Thoughts ?
>>>
>>> On Thu, Jun 15, 2017 at 7:29 PM, Sunil G <sunilg@apache.org> wrote:
>>>
>>>> And adding to that, we have aggregated container usage per node. I dont
>>>> think you ll have a per container real memory usage recorded from YARN.
>>>> You ll have these 2 entries in ideal cases.
>>>>
>>>> Resource Utilization by Node :
>>>> Resource Utilization by Containers : PMem:0 MB, VMem:0 MB, VCores:0.0
>>>>
>>>> Thanks
>>>> Sunil
>>>>
>>>> On Thu, Jun 15, 2017 at 6:56 AM Sunil G <sunilg@apache.org> wrote:
>>>>
>>>>> Hi Shmuel
>>>>>
>>>>> This feature is available in Hadoop 2.8 + release lines. Or Hadoop 3
>>>>> alpha's.
>>>>>
>>>>> Thanks
>>>>> Sunil
>>>>>
>>>>> On Wed, Jun 14, 2017 at 6:31 AM Shmuel Blitz <
>>>>> shmuel.blitz@similarweb.com> wrote:
>>>>>
>>>>>> Hi Sunil,
>>>>>>
>>>>>> Thanks for your response.
>>>>>>
>>>>>> Here is the response I get when running  "yarn node -status {nodeId}"
>>>>>>  :
>>>>>>
>>>>>> Node Report :
>>>>>>         Node-Id : myNode:4545
>>>>>>         Rack : /default
>>>>>>         Node-State : RUNNING
>>>>>>         Node-Http-Address : muNode:8042
>>>>>>         Last-Health-Update : Wed 14/Jun/17 08:25:43:261EST
>>>>>>         Health-Report :
>>>>>>         Containers : 7
>>>>>>         Memory-Used : 44032MB
>>>>>>         Memory-Capacity : 49152MB
>>>>>>         CPU-Used : 16 vcores
>>>>>>         CPU-Capacity : 48 vcores
>>>>>>         Node-Labels :
>>>>>>
>>>>>> However, this is information regarding the entire node, containing
>>>>>> all containers.
>>>>>>
>>>>>> I have no way of using this to see the value I give to '
>>>>>> spark.executor.memory' makes sense or not.
>>>>>>
>>>>>> I'm looking for memory usage/allocated information *per-container*.
>>>>>>
>>>>>> Shmuel
>>>>>>
>>>>>> On Wed, Jun 14, 2017 at 4:04 PM, Sunil G <sunilg@apache.org>
wrote:
>>>>>>
>>>>>>> Hi Shmuel
>>>>>>>
>>>>>>> In Hadoop 2.8 release line, you could check "yarn node -status
>>>>>>> {nodeId}" CLI command or "http://<rm http
>>>>>>> address:port>/ws/v1/cluster/nodes/{nodeid}" REST end point
to get
>>>>>>> container's actual resource usage per node. You could also check
the same
>>>>>>> in any of Hadoop 3.0 alpha releases as well.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Sunil
>>>>>>>
>>>>>>> On Tue, Jun 13, 2017 at 11:29 PM Shmuel Blitz <
>>>>>>> shmuel.blitz@similarweb.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Thanks for your response.
>>>>>>>>
>>>>>>>> The /metrics API returns a blank page on our RM.
>>>>>>>>
>>>>>>>> The /jmx API has some metrics, but these are the same metrics
we
>>>>>>>> are already loading into data-dog.
>>>>>>>> It's not good enough, because it doesn't break down the memory
use
>>>>>>>> by container.
>>>>>>>>
>>>>>>>> I need the by-container breakdown because resource allocation
is
>>>>>>>> per container and I would like to se if my job is really
using up all the
>>>>>>>> allocated memory.
>>>>>>>>
>>>>>>>> Shmuel
>>>>>>>>
>>>>>>>> On Tue, Jun 13, 2017 at 6:05 PM, Sidharth Kumar <
>>>>>>>> sidharthkumar2707@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I guess you can get it from http://<resourcemanager-host>:<rm-port>/jmx
>>>>>>>>> or /metrics
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Sidharth
>>>>>>>>> LinkedIn: www.linkedin.com/in/sidharthkumar2792
>>>>>>>>>
>>>>>>>>> On 13-Jun-2017 6:26 PM, "Shmuel Blitz" <
>>>>>>>>> shmuel.blitz@similarweb.com> wrote:
>>>>>>>>>
>>>>>>>>>> (This question has also been published on StackOveflow
>>>>>>>>>> <https://stackoverflow.com/q/44484940/416300>)
>>>>>>>>>>
>>>>>>>>>> I am looking for a way to monitor memory usage of
YARN containers
>>>>>>>>>> over time.
>>>>>>>>>>
>>>>>>>>>> Specifically - given a YARN application-id, how can
you get a
>>>>>>>>>> graph, showing the memory usage of each of its containers
over time?
>>>>>>>>>>
>>>>>>>>>> The main goal is to better fit memory allocation
requirements for
>>>>>>>>>> our YARN applications (Spark / Map-Reduce), to avoid
over allocation and
>>>>>>>>>> cluster resource waste. A side goal would be the
ability to debug memory
>>>>>>>>>> issues when developing our jobs and attempting to
pick reasonable resource
>>>>>>>>>> allocations.
>>>>>>>>>>
>>>>>>>>>> We've tried using the Data-Dog integration, But it
doesn't break
>>>>>>>>>> down the metrics by container.
>>>>>>>>>>
>>>>>>>>>> Another approach was to parse the hadoop-yarn logs.
These logs
>>>>>>>>>> have messages like:
>>>>>>>>>>
>>>>>>>>>> Memory usage of ProcessTree 57251 for container-id
>>>>>>>>>> container_e116_1495951495692_35134_01_000001: 1.9
GB of 11 GB
>>>>>>>>>> physical memory used; 14.4 GB of 23.1 GB virtual
memory used
>>>>>>>>>> Parsing the logs correctly can yield data that can
be used to
>>>>>>>>>> plot a graph of memory usage over time.
>>>>>>>>>>
>>>>>>>>>> That's exactly what we want, but there are two downsides:
>>>>>>>>>>
>>>>>>>>>> It involves reading human-readable log lines and
parsing them
>>>>>>>>>> into numeric data. We'd love to avoid that.
>>>>>>>>>> If this data can be consumed otherwise, we're hoping
it'll have
>>>>>>>>>> more information that we might be interest in in
the future. We wouldn't
>>>>>>>>>> want to put the time into parsing the logs just to
realize we need
>>>>>>>>>> something else.
>>>>>>>>>> Is there any other way to extract these metrics,
either by
>>>>>>>>>> plugging in to an existing producer or by writing
a simple listener?
>>>>>>>>>>
>>>>>>>>>> Perhaps a whole other approach?
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> [image: Logo]
>>>>>>>>>> <https://www.similarweb.com/?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
>>>>>>>>>> Shmuel Blitz
>>>>>>>>>> *Big Data Developer*
>>>>>>>>>> www.similarweb.com
>>>>>>>>>> <http://www.similarweb.com?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
>>>>>>>>>>
>>>>>>>>>> <https://www.facebook.com/SimilarWeb/?fref=ts&utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
Like
>>>>>>>>>> Us
>>>>>>>>>> <https://www.facebook.com/SimilarWeb/?fref=ts&utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
>>>>>>>>>>
>>>>>>>>>> <https://twitter.com/SimilarWeb?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
Follow
>>>>>>>>>> Us
>>>>>>>>>> <https://twitter.com/SimilarWeb?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
>>>>>>>>>>
>>>>>>>>>> <https://www.youtube.com/watch?v=Sb09jaZYY7s&utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
Watch
>>>>>>>>>> Us
>>>>>>>>>> <https://www.youtube.com/watch?v=Sb09jaZYY7s&utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
>>>>>>>>>>
>>>>>>>>>> <https://www.similarweb.com/blog/?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
Read
>>>>>>>>>> Us
>>>>>>>>>> <https://www.similarweb.com/blog/?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> [image: Logo]
>>>>>>>> <https://www.similarweb.com/?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
>>>>>>>> Shmuel Blitz
>>>>>>>> *Big Data Developer*
>>>>>>>> www.similarweb.com
>>>>>>>> <http://www.similarweb.com?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
>>>>>>>>
>>>>>>>> <https://www.facebook.com/SimilarWeb/?fref=ts&utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
Like
>>>>>>>> Us
>>>>>>>> <https://www.facebook.com/SimilarWeb/?fref=ts&utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
>>>>>>>>
>>>>>>>> <https://twitter.com/SimilarWeb?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
Follow
>>>>>>>> Us
>>>>>>>> <https://twitter.com/SimilarWeb?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
>>>>>>>>
>>>>>>>> <https://www.youtube.com/watch?v=Sb09jaZYY7s&utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
Watch
>>>>>>>> Us
>>>>>>>> <https://www.youtube.com/watch?v=Sb09jaZYY7s&utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
>>>>>>>>
>>>>>>>> <https://www.similarweb.com/blog/?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
Read
>>>>>>>> Us
>>>>>>>> <https://www.similarweb.com/blog/?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> [image: Logo]
>>>>>> <https://www.similarweb.com/?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
>>>>>> Shmuel Blitz
>>>>>> *Big Data Developer*
>>>>>> www.similarweb.com
>>>>>> <http://www.similarweb.com?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
>>>>>>
>>>>>> <https://www.facebook.com/SimilarWeb/?fref=ts&utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
Like
>>>>>> Us
>>>>>> <https://www.facebook.com/SimilarWeb/?fref=ts&utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
>>>>>>
>>>>>> <https://twitter.com/SimilarWeb?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
Follow
>>>>>> Us
>>>>>> <https://twitter.com/SimilarWeb?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
>>>>>>
>>>>>> <https://www.youtube.com/watch?v=Sb09jaZYY7s&utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
Watch
>>>>>> Us
>>>>>> <https://www.youtube.com/watch?v=Sb09jaZYY7s&utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
>>>>>>
>>>>>> <https://www.similarweb.com/blog/?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
Read
>>>>>> Us
>>>>>> <https://www.similarweb.com/blog/?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
>>>>>>
>>>>>
>>>
>>
>>
>> --
>> [image: Logo]
>> <https://www.similarweb.com/?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
>> Shmuel Blitz
>> *Big Data Developer*
>> www.similarweb.com
>> <http://www.similarweb.com?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
>>
>> <https://www.facebook.com/SimilarWeb/?fref=ts&utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
Like
>> Us
>> <https://www.facebook.com/SimilarWeb/?fref=ts&utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
>>
>> <https://twitter.com/SimilarWeb?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
Follow
>> Us
>> <https://twitter.com/SimilarWeb?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
>>
>> <https://www.youtube.com/watch?v=Sb09jaZYY7s&utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
Watch
>> Us
>> <https://www.youtube.com/watch?v=Sb09jaZYY7s&utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
>>
>> <https://www.similarweb.com/blog/?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
Read
>> Us
>> <https://www.similarweb.com/blog/?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
>>
>
>

Mime
View raw message