hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Collins (JIRA)" <j...@apache.org>
Subject [jira] [Moved] (MAPREDUCE-2833) Job Tracker needs to collect more job/task execution stats and save them to DFS file
Date Thu, 11 Aug 2011 18:56:27 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-2833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eli Collins moved HADOOP-1950 to MAPREDUCE-2833:
------------------------------------------------

        Key: MAPREDUCE-2833  (was: HADOOP-1950)
    Project: Hadoop Map/Reduce  (was: Hadoop Common)

> Job Tracker needs to collect more job/task execution stats and save them to DFS file
> ------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2833
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2833
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Runping Qi
>              Labels: newbie
>
> In order to facilitate offline analysis on the dynamic behaviors and performance characterics
of map/reduce jobs, 
> we need the job tracker to collect some data about jobs and save them to DFS files. Some
data are  in time series form, 
> and some are not.
> Below is a preliminary list of desired data. Some of them are already available in the
current job trackers. Some are new.
> For each map/reduce job, we need the following non time series data:
>    1. jobid, jobname,  number of mappers, number of reducers, start time, end time, end
of mapper phase
>    2. Average (median, min, max) of successful mapper execution time, input/output records/bytes
>    3. Average (median, min, max) of uncessful mapper execution time, input/output records/bytes
>    4.Total mapper retries,  max, average number of re-tries per mapper
>    5. The reasons for mapper task fails.
>    6. Average (median, min, max) of successful reducer execution time, input/output reocrds/bytes
>            Execution time is the difference between the sort end time and the task end
time
>    7. Average (median, min, max) of successful copy time (from the mapper phase end time
 to the sort start time).
>    8. Average (median, min, max) of successful sorting time for successful reducers
>    9. Average (median, min, max) of unsuccessful reducer execution time (from the end
of mapper phase or the start of the task, 
>        whichever later, to the end of task)
>    10. Total reducer retries,  max, average number of per reducer retries
>    11. The reasons for reducer task fails (user code error, lost tracker, failed to write
to DFS, etc.)
> For each map/reduce job, we collect the following  time series data (with one minute
interval):
>     1. Numbers of pending mappers, reducers
>     2. Number of running mappers, reducers
> For the job tracker, we need the following data:
>     1. Number of trackers 
>     2. Start time 
>     3. End time 
>     4. The list of map reduce jobs (their ids, starttime/endtime)
>     
> The following time series data (with one minute interval):
>     1. The number of running jobs
>     2. The numbers of running mappers/reducers
>     3. The number pending mappers/reducers 
> The data collection should be optional. That is, a job tracker can turn off such data
collection, and 
> in that case, it should not pay the cost.
> The job tracker should organize the in memory version of the collected data in such a
way that:
> 1. it does not consume excessive amount of memory
> 2. the data may be suitable for presenting through the Web status pages.
> The data saved on DFS files should be in hadoop record format.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message