Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hadoop-dev@lucene.apache.org
Message-ID: <323006.1190831390697.JavaMail.jira@brutus>
Date: Wed, 26 Sep 2007 11:29:50 -0700 (PDT)
From: "Runping Qi (JIRA)" <jira@apache.org>
To: hadoop-dev@lucene.apache.org
Subject: [jira] Created: (HADOOP-1950) Job Tracker needs to collect more
 job/task execution stats and save them to DFS file
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

Job Tracker needs to collect more job/task execution stats and save them to DFS file
------------------------------------------------------------------------------------

                 Key: HADOOP-1950
                 URL: https://issues.apache.org/jira/browse/HADOOP-1950
             Project: Hadoop
          Issue Type: New Feature
            Reporter: Runping Qi


In order to facilitate offline analysis on the dynamic behaviors and performance characterics of map/reduce jobs, 
we need the job tracker to collect some data about jobs and save them to DFS files. Some data are  in time series form, 
and some are not.
Below is a preliminary list of desired data. Some of them are already available in the current job trackers. Some are new.

For each map/reduce job, we need the following non time series data:
   1. jobid, jobname,  number of mappers, number of reducers, start time, end time, end of mapper phase
   2. Average (median, min, max) of successful mapper execution time, input/output records/bytes
   3. Average (median, min, max) of uncessful mapper execution time, input/output records/bytes
   4.Total mapper retries,  max, average number of re-tries per mapper
   5. The reasons for mapper task fails.

   6. Average (median, min, max) of successful reducer execution time, input/output reocrds/bytes
           Execution time is the difference between the sort end time and the task end time
   7. Average (median, min, max) of successful copy time (from the mapper phase end time  to the sort start time).
   8. Average (median, min, max) of successful sorting time for successful reducers

   9. Average (median, min, max) of unsuccessful reducer execution time (from the end of mapper phase or the start of the task, 
       whichever later, to the end of task)
   10. Total reducer retries,  max, average number of per reducer retries
   11. The reasons for reducer task fails (user code error, lost tracker, failed to write to DFS, etc.)

For each map/reduce job, we collect the following  time series data (with one minute interval):

    1. Numbers of pending mappers, reducers
    2. Number of running mappers, reducers

For the job tracker, we need the following data:

    1. Number of trackers 
    2. Start time 
    3. End time 
    4. The list of map reduce jobs (their ids, starttime/endtime)
    
The following time series data (with one minute interval):
    1. The number of running jobs
    2. The numbers of running mappers/reducers
    3. The number pending mappers/reducers 


The data collection should be optional. That is, a job tracker can turn off such data collection, and 
in that case, it should not pay the cost.

The job tracker should organize the in memory version of the collected data in such a way that:
1. it does not consume excessive amount of memory
2. the data may be suitable for presenting through the Web status pages.

The data saved on DFS files should be in hadoop record format.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.