Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7B75E9517 for ; Wed, 5 Oct 2011 06:38:01 +0000 (UTC) Received: (qmail 17441 invoked by uid 500); 5 Oct 2011 06:37:58 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 17089 invoked by uid 500); 5 Oct 2011 06:37:55 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 17081 invoked by uid 99); 5 Oct 2011 06:37:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Oct 2011 06:37:55 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of silvianhadoop@gmail.com designates 209.85.160.176 as permitted sender) Received: from [209.85.160.176] (HELO mail-gy0-f176.google.com) (209.85.160.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Oct 2011 06:37:47 +0000 Received: by gyc15 with SMTP id 15so1734584gyc.35 for ; Tue, 04 Oct 2011 23:37:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=gd69DQxYvus/FfQesDnwARt6ojFcd0TNZ/HB6Ra3VyE=; b=pLXA2LDipPQbPBDvYE/VN0IzLnJg8WITPxMBmLKmUk+hZD/H0DKxW5aKAC4nb5L65N AulwcdoHaFEZTuHZlKRaASfK109lyDwmVhmTqK9V9jFdlLOd4XExrPbau4S/7CWqLY5E mcoTTlECP4pvxFDiec81YQjP9MPcsCtV2w414= MIME-Version: 1.0 Received: by 10.68.27.199 with SMTP id v7mr15996627pbg.2.1317796646434; Tue, 04 Oct 2011 23:37:26 -0700 (PDT) Received: by 10.142.108.18 with HTTP; Tue, 4 Oct 2011 23:37:26 -0700 (PDT) In-Reply-To: References: Date: Tue, 4 Oct 2011 23:37:26 -0700 Message-ID: Subject: Re: Monitoring Slow job. From: patrick sang To: common-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=bcaec521634bff379c04ae87709a X-Virus-Checked: Checked by ClamAV on apache.org --bcaec521634bff379c04ae87709a Content-Type: text/plain; charset=ISO-8859-1 Hi all, Thanks for Vitthai your reply leads me to the solution. Here is what i do, hopefully it is considered contribution back to community. 1. ./hadoop job -list all |awk '{ if($2==1) print $1 }' ---> to get the list of running JobID 2. ./hadoop job -status JobID --> file: hdfs://xxx.xx.xxx/zz/aa/job.xml 3. ./hadoop fs -cat hdfs://xxx.xx.xxx/zz/aa/job.xml here we got mapred.output.dir 4. ./hadoop job -history --> Launched At: 5. Get the time diff between "launched at" and now. I would be much easier to just scrape the http://jobtracker:50030/jobdetails.jsp?jobid=jobId or there might be some more elegant way of getting this. It's just how i did it this time. Cheers, P On Mon, Oct 3, 2011 at 5:41 PM, Vitthal "Suhas" Gogate < gogate@hortonworks.com> wrote: > I am not sure there is a easy way to get what you want on command line.. > one > option is to use following command which would give you verbose job history > where you can find submit, Launch & Finish time (including duration on > FinishTime line). I am using hadoop-0.20.205.0 branch. So check if you > have some such option for the version of hadoop you are using... > > I am pasting sample output for my wordcount program, > > bin/hadoop job -history > > ====================================== > > horton-mac:hadoop-0.20.205.0 vgogate$ bin/hadoop job -history output > Warning: $HADOOP_HOME is deprecated. > > > Hadoop job: 0001_1317688277686_vgogate > ===================================== > Job tracker host name: job > job tracker start time: Sun May 16 08:53:51 PDT 1976 > User: vgogate > JobName: word count > JobConf: > > hdfs://horton-mac.local:54310/tmp/mapred/staging/vgogate/.staging/job_201110031726_0001/job.xml > Submitted At: 3-Oct-2011 17:31:17 > Launched At: 3-Oct-2011 17:31:17 (0sec) > Finished At: 3-Oct-2011 17:31:50 (32sec) > Status: SUCCESS > Counters: > > |Group Name |Counter name |Map Value > |Reduce Value|Total Value| > > --------------------------------------------------------------------------------------- > |Job Counters |Launched reduce tasks |0 > |0 |1 > |Job Counters |SLOTS_MILLIS_MAPS |0 > |0 |12,257 > |Job Counters |Total time spent by all reduces waiting > after reserving slots (ms)|0 |0 |0 > |Job Counters |Total time spent by all maps waiting after > reserving slots (ms)|0 |0 |0 > |Job Counters |Launched map tasks |0 > |0 |1 > |Job Counters |Data-local map tasks |0 > |0 |1 > |Job Counters |SLOTS_MILLIS_REDUCES |0 > |0 |10,082 > |File Output Format Counters |Bytes Written |0 > |61,192 |61,192 > |FileSystemCounters |FILE_BYTES_READ |0 > |70,766 |70,766 > |FileSystemCounters |HDFS_BYTES_READ |112,056 > |0 |112,056 > |FileSystemCounters |FILE_BYTES_WRITTEN |92,325 > |92,294 |184,619 > |FileSystemCounters |HDFS_BYTES_WRITTEN |0 > |61,192 |61,192 > |File Input Format Counters |Bytes Read |111,933 > |0 |111,933 > |Map-Reduce Framework |Reduce input groups |0 > |2,411 |2,411 > |Map-Reduce Framework |Map output materialized bytes |70,766 > |0 |70,766 > |Map-Reduce Framework |Combine output records |2,411 > |0 |2,411 > |Map-Reduce Framework |Map input records |2,643 > |0 |2,643 > |Map-Reduce Framework |Reduce shuffle bytes |0 > |0 |0 > |Map-Reduce Framework |Reduce output records |0 > |2,411 |2,411 > |Map-Reduce Framework |Spilled Records |2,411 > |2,411 |4,822 > |Map-Reduce Framework |Map output bytes |120,995 > |0 |120,995 > |Map-Reduce Framework |Combine input records |5,849 > |0 |5,849 > |Map-Reduce Framework |Map output records |5,849 > |0 |5,849 > |Map-Reduce Framework |SPLIT_RAW_BYTES |123 > |0 |123 > |Map-Reduce Framework |Reduce input records |0 > |2,411 |2,411 > ===================================== > > Task Summary > ============================ > Kind Total Successful Failed Killed StartTime FinishTime > > Setup 1 1 0 0 3-Oct-2011 17:31:20 3-Oct-2011 17:31:24 > (4sec) > Map 1 1 0 0 3-Oct-2011 17:31:26 3-Oct-2011 17:31:30 > (4sec) > Reduce 1 1 0 0 3-Oct-2011 17:31:32 3-Oct-2011 > 17:31:42 > (10sec) > Cleanup 1 1 0 0 3-Oct-2011 17:31:44 3-Oct-2011 > 17:31:48 (4sec) > ============================ > > > Analysis > ========= > > Time taken by best performing map task task_201110031726_0001_m_000000: > 4sec > Average time taken by map tasks: 4sec > Worse performing map tasks: > TaskId Timetaken > task_201110031726_0001_m_000000 4sec > The last map task task_201110031726_0001_m_000000 finished at (relative to > the Job launch time): 3-Oct-2011 17:31:30 (12sec) > > Time taken by best performing shuffle task task_201110031726_0001_r_000000: > 7sec > Average time taken by shuffle tasks: 7sec > Worse performing shuffle tasks: > TaskId Timetaken > task_201110031726_0001_r_000000 7sec > The last shuffle task task_201110031726_0001_r_000000 finished at (relative > to the Job launch time): 3-Oct-2011 17:31:39 (21sec) > > Time taken by best performing reduce task task_201110031726_0001_r_000000: > 2sec > Average time taken by reduce tasks: 2sec > Worse performing reduce tasks: > TaskId Timetaken > task_201110031726_0001_r_000000 2sec > The last reduce task task_201110031726_0001_r_000000 finished at (relative > to the Job launch time): 3-Oct-2011 17:31:42 (24sec) > ========= > > FAILED SETUP task list for 0001_1317688277686_vgogate > TaskId StartTime FinishTime Error > ==================================================== > > KILLED SETUP task list for 0001_1317688277686_vgogate > TaskId StartTime FinishTime Error > ==================================================== > > FAILED MAP task list for 0001_1317688277686_vgogate > TaskId StartTime FinishTime Error InputSplits > ==================================================== > > KILLED MAP task list for 0001_1317688277686_vgogate > TaskId StartTime FinishTime Error InputSplits > ==================================================== > > FAILED REDUCE task list for 0001_1317688277686_vgogate > TaskId StartTime FinishTime Error > ==================================================== > > KILLED REDUCE task list for 0001_1317688277686_vgogate > TaskId StartTime FinishTime Error > ==================================================== > > FAILED CLEANUP task list for 0001_1317688277686_vgogate > TaskId StartTime FinishTime Error > ==================================================== > > KILLED CLEANUP task list for 0001_1317688277686_vgogate > TaskId StartTime FinishTime Error > ==================================================== > > ====================================== > > On Mon, Oct 3, 2011 at 1:53 PM, patrick sang >wrote: > > > Hi Hadoopers, > > > > I am writing script to detect if there is any running job has been > running > > longer than X hours. > > > > So far, I use > > ./hadoop job -jt jobtracker:port -list all |awk '{ if($2==1) print $1 }' > > ---> to get the list of running JobID > > > > I am finding the way to get how long the job has been running from jobId. > > > > In the web admin page (http://job:50030), we can easily see the duration > > of > > each jobId pretty easily > > from *Started at:, ** Running for:*. of each running job. > > > > How do we get the such an information from command line? > > > > hope this make sense. > > Thanks you in advance, > > > > -P > > > --bcaec521634bff379c04ae87709a--