hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Chen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
Date Tue, 13 Jul 2010 01:40:01 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12887620#action_12887620

Scott Chen commented on MAPREDUCE-220:

@Evan: This sounds like a good experiment.

The CPU and memory collected in this JIRA is obtained by parsing /proc/ directory.
It is very good because /proc/ is in memory so the overhead is small. 
However, there is no per process IO and network information in /proc/.
And like you mentioned, using tools like tcpdump can be very expensive.

Another approach to do this is by counting the {non,rack,data}-local bytes fetched from HDFS
and fetched/served for map output.
This way we can estimate the IO and network traffic from these numbers.
The drawback of this approach is that this doesn't capture IO and network that is not introduced
by the framework.
People can write user script which does lots of IO. That will not be captured by this.

> Collecting cpu and memory usage for MapReduce tasks
> ---------------------------------------------------
>                 Key: MAPREDUCE-220
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: task, tasktracker
>            Reporter: Hong Tang
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>         Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt
> It would be nice for TaskTracker to collect cpu and memory usage for individual Map or
Reduce tasks over time.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message