hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-2037) Capturing interim progress times, CPU usage, and memory usage, when tasks reach certain progress thresholds
Date Sun, 14 Aug 2011 13:01:34 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084828#comment-13084828
] 

Hudson commented on MAPREDUCE-2037:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk #754 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/754/])
    MAPREDUCE-2037. Capture intermediate progress, CPU and memory usage for tasks. Contributed
by Dick King.

acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1157253
Files : 
* /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/jobhistory/AvroArrayUtils.java
* /hadoop/common/trunk/mapreduce/src/tools/org/apache/hadoop/tools/rumen/MapTaskAttemptInfo.java
* /hadoop/common/trunk/mapreduce/src/java/mapred-default.xml
* /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/TaskInProgress.java
* /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/Counters.java
* /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/jobhistory/Events.avpr
* /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/jobhistory/TaskAttemptUnsuccessfulCompletionEvent.java
* /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/StatePeriodicStats.java
* /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/tools/rumen/TestRumenJobTraces.java
* /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/jobhistory/ReduceAttemptFinishedEvent.java
* /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/server/jobtracker/JTConfig.java
* /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapred/TestTaskPerformanceSplits.java
* /hadoop/common/trunk/mapreduce/src/tools/org/apache/hadoop/tools/rumen/ZombieJob.java
* /hadoop/common/trunk/mapreduce/src/tools/org/apache/hadoop/tools/rumen/ReduceAttempt20LineHistoryEventEmitter.java
* /hadoop/common/trunk/mapreduce/src/tools/org/apache/hadoop/tools/rumen/TaskAttemptInfo.java
* /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEvents.java
* /hadoop/common/trunk/mapreduce/CHANGES.txt
* /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/CumulativePeriodicStats.java
* /hadoop/common/trunk/mapreduce/src/tools/org/apache/hadoop/tools/rumen/ReduceTaskAttemptInfo.java
* /hadoop/common/trunk/mapreduce/src/tools/org/apache/hadoop/tools/rumen/TaskAttempt20LineEventEmitter.java
* /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/JobInProgress.java
* /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/PeriodicStatsAccumulator.java
* /hadoop/common/trunk/mapreduce/src/tools/org/apache/hadoop/tools/rumen/JobBuilder.java
* /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/ProgressSplitsBlock.java
* /hadoop/common/trunk/mapreduce/src/tools/org/apache/hadoop/tools/rumen/LoggedTaskAttempt.java
* /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/jobhistory/MapAttemptFinishedEvent.java
* /hadoop/common/trunk/mapreduce/src/tools/org/apache/hadoop/tools/rumen/MapAttempt20LineHistoryEventEmitter.java


> Capturing interim progress times, CPU usage, and memory usage, when tasks reach certain
progress thresholds
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2037
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2037
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Dick King
>            Assignee: Dick King
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2037.patch, MAPREDUCE-2037.patch
>
>
> We would like to capture the following information at certain progress thresholds as
a task runs:
>    * Time taken so far
>    * CPU load [either at the time the data are taken, or exponentially smoothed]
>    * Memory load [also either at the time the data are taken, or exponentially smoothed]
> This would be taken at intervals that depend on the task progress plateaus.  For example,
reducers have three progress ranges -- [0-1/3], (1/3-2/3], and (2/3-3/3] -- where fundamentally
different activities happen.  Mappers have different boundaries, I understand, that are not
symmetrically placed.  Data capture boundaries should coincide with activity boundaries. 
For the state information capture [CPU and memory] we should average over the covered interval.
> This data would flow in with the heartbeats.  It would be placed in the job history as
part of the task attempt completion event, so it could be processed by rumen or some similar
tool and could drive a benchmark engine.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message