Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 58100 invoked from network); 26 Sep 2007 18:30:14 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 26 Sep 2007 18:30:14 -0000 Received: (qmail 8995 invoked by uid 500); 26 Sep 2007 18:30:04 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 8973 invoked by uid 500); 26 Sep 2007 18:30:04 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 8964 invoked by uid 99); 26 Sep 2007 18:30:04 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Sep 2007 11:30:04 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Sep 2007 18:32:30 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id AAE9171418F for ; Wed, 26 Sep 2007 11:29:50 -0700 (PDT) Message-ID: <323006.1190831390697.JavaMail.jira@brutus> Date: Wed, 26 Sep 2007 11:29:50 -0700 (PDT) From: "Runping Qi (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Created: (HADOOP-1950) Job Tracker needs to collect more job/task execution stats and save them to DFS file MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Job Tracker needs to collect more job/task execution stats and save them to DFS file ------------------------------------------------------------------------------------ Key: HADOOP-1950 URL: https://issues.apache.org/jira/browse/HADOOP-1950 Project: Hadoop Issue Type: New Feature Reporter: Runping Qi In order to facilitate offline analysis on the dynamic behaviors and performance characterics of map/reduce jobs, we need the job tracker to collect some data about jobs and save them to DFS files. Some data are in time series form, and some are not. Below is a preliminary list of desired data. Some of them are already available in the current job trackers. Some are new. For each map/reduce job, we need the following non time series data: 1. jobid, jobname, number of mappers, number of reducers, start time, end time, end of mapper phase 2. Average (median, min, max) of successful mapper execution time, input/output records/bytes 3. Average (median, min, max) of uncessful mapper execution time, input/output records/bytes 4.Total mapper retries, max, average number of re-tries per mapper 5. The reasons for mapper task fails. 6. Average (median, min, max) of successful reducer execution time, input/output reocrds/bytes Execution time is the difference between the sort end time and the task end time 7. Average (median, min, max) of successful copy time (from the mapper phase end time to the sort start time). 8. Average (median, min, max) of successful sorting time for successful reducers 9. Average (median, min, max) of unsuccessful reducer execution time (from the end of mapper phase or the start of the task, whichever later, to the end of task) 10. Total reducer retries, max, average number of per reducer retries 11. The reasons for reducer task fails (user code error, lost tracker, failed to write to DFS, etc.) For each map/reduce job, we collect the following time series data (with one minute interval): 1. Numbers of pending mappers, reducers 2. Number of running mappers, reducers For the job tracker, we need the following data: 1. Number of trackers 2. Start time 3. End time 4. The list of map reduce jobs (their ids, starttime/endtime) The following time series data (with one minute interval): 1. The number of running jobs 2. The numbers of running mappers/reducers 3. The number pending mappers/reducers The data collection should be optional. That is, a job tracker can turn off such data collection, and in that case, it should not pay the cost. The job tracker should organize the in memory version of the collected data in such a way that: 1. it does not consume excessive amount of memory 2. the data may be suitable for presenting through the Web status pages. The data saved on DFS files should be in hadoop record format. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.