hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Chen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
Date Tue, 04 May 2010 23:22:10 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864065#action_12864065

Scott Chen commented on MAPREDUCE-220:

Hey guys, Thanks for the help.

I am not familiar with the counters. But from Arun and Vinod's comments I can the see the
1. Reuse of the counter logging and transmitting
2. Easier to expose to end users
This is really good!

But as Dhruba mentioned, we want to use this information for scheduling.
So measuring it and then sending it with the heart beat ensures the scheduler gets the latest
One minute may be too slow for the scheduling.

The other question I have is that 
Using counters, can we aggregate using other method (e.g. max) rather than just increment

My original plan is to report these information in this issue and aggregate them into job
level status in MAPREDUCE-1739.
And I am planning to generate these fields after aggregation:
1. Total CPU cycles (# of giga-cycles)
2. Total Memory occupied time (GB-sec)
3. Maximum peak memory on one task (GB)
4. Maximum peak CPU on one task (GHz)
Is it possible to get these fields by using the counters?

I will read the relavent codes and think more about it.
Maybe there's a way to obtain both benefit.

Vinod: I also feel that there are lots of redundant creation/computation of processTree.
Maybe we should refactor the codes and use one thread to compute it and expose the information
to others.

> Collecting cpu and memory usage for MapReduce tasks
> ---------------------------------------------------
>                 Key: MAPREDUCE-220
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: task, tasktracker
>            Reporter: Hong Tang
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>         Attachments: MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt
> It would be nice for TaskTracker to collect cpu and memory usage for individual Map or
Reduce tasks over time.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message