hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From patrick sang <silvianhad...@gmail.com>
Subject Re: Monitoring Slow job.
Date Wed, 05 Oct 2011 06:37:26 GMT
Hi all,
Thanks for Vitthai your reply leads me to the solution.
Here is what i do, hopefully it is considered contribution back to
community.

1. ./hadoop job -list all |awk '{ if($2==1) print $1 }'
---> to get the list of running JobID

2. ./hadoop job -status JobID
--> file: hdfs://xxx.xx.xxx/zz/aa/job.xml

3. ./hadoop fs -cat hdfs://xxx.xx.xxx/zz/aa/job.xml
here we got mapred.output.dir

4. ./hadoop job -history <mapred.output.dir>
--> Launched At:

5. Get the time diff between "launched at" and now.

I would be much easier to just scrape the
http://jobtracker:50030/jobdetails.jsp?jobid=jobId
or there might be some more elegant way of getting this. It's just how i did
it this time.

Cheers,
P

On Mon, Oct 3, 2011 at 5:41 PM, Vitthal "Suhas" Gogate <
gogate@hortonworks.com> wrote:

> I am not sure there is a easy way to get what you want on command line..
> one
> option is to use following command which would give you verbose job history
> where you can find submit, Launch & Finish time (including duration on
> FinishTime line).  I am using hadoop-0.20.205.0  branch. So check if you
> have some such option for the version of hadoop you are using...
>
> I am pasting sample output for my wordcount program,
>
> bin/hadoop job -history <job_output_directory_on_hdfs>
>
> ======================================
>
> horton-mac:hadoop-0.20.205.0 vgogate$ bin/hadoop job -history output
> Warning: $HADOOP_HOME is deprecated.
>
>
> Hadoop job: 0001_1317688277686_vgogate
> =====================================
> Job tracker host name: job
> job tracker start time: Sun May 16 08:53:51 PDT 1976
> User: vgogate
> JobName: word count
> JobConf:
>
> hdfs://horton-mac.local:54310/tmp/mapred/staging/vgogate/.staging/job_201110031726_0001/job.xml
> Submitted At: 3-Oct-2011 17:31:17
> Launched At: 3-Oct-2011 17:31:17 (0sec)
> Finished At: 3-Oct-2011 17:31:50 (32sec)
> Status: SUCCESS
> Counters:
>
> |Group Name                    |Counter name                  |Map Value
> |Reduce Value|Total Value|
>
> ---------------------------------------------------------------------------------------
> |Job Counters                  |Launched reduce tasks         |0
> |0         |1
> |Job Counters                  |SLOTS_MILLIS_MAPS             |0
> |0         |12,257
> |Job Counters                  |Total time spent by all reduces waiting
> after reserving slots (ms)|0         |0         |0
> |Job Counters                  |Total time spent by all maps waiting after
> reserving slots (ms)|0         |0         |0
> |Job Counters                  |Launched map tasks            |0
> |0         |1
> |Job Counters                  |Data-local map tasks          |0
> |0         |1
> |Job Counters                  |SLOTS_MILLIS_REDUCES          |0
> |0         |10,082
> |File Output Format Counters   |Bytes Written                 |0
> |61,192    |61,192
> |FileSystemCounters            |FILE_BYTES_READ               |0
> |70,766    |70,766
> |FileSystemCounters            |HDFS_BYTES_READ               |112,056
> |0         |112,056
> |FileSystemCounters            |FILE_BYTES_WRITTEN            |92,325
> |92,294    |184,619
> |FileSystemCounters            |HDFS_BYTES_WRITTEN            |0
> |61,192    |61,192
> |File Input Format Counters    |Bytes Read                    |111,933
> |0         |111,933
> |Map-Reduce Framework          |Reduce input groups           |0
> |2,411     |2,411
> |Map-Reduce Framework          |Map output materialized bytes |70,766
> |0         |70,766
> |Map-Reduce Framework          |Combine output records        |2,411
> |0         |2,411
> |Map-Reduce Framework          |Map input records             |2,643
> |0         |2,643
> |Map-Reduce Framework          |Reduce shuffle bytes          |0
> |0         |0
> |Map-Reduce Framework          |Reduce output records         |0
> |2,411     |2,411
> |Map-Reduce Framework          |Spilled Records               |2,411
> |2,411     |4,822
> |Map-Reduce Framework          |Map output bytes              |120,995
> |0         |120,995
> |Map-Reduce Framework          |Combine input records         |5,849
> |0         |5,849
> |Map-Reduce Framework          |Map output records            |5,849
> |0         |5,849
> |Map-Reduce Framework          |SPLIT_RAW_BYTES               |123
> |0         |123
> |Map-Reduce Framework          |Reduce input records          |0
> |2,411     |2,411
> =====================================
>
> Task Summary
> ============================
> Kind    Total    Successful    Failed    Killed    StartTime    FinishTime
>
> Setup    1    1        0    0    3-Oct-2011 17:31:20    3-Oct-2011 17:31:24
> (4sec)
> Map    1    1        0    0    3-Oct-2011 17:31:26    3-Oct-2011 17:31:30
> (4sec)
> Reduce    1    1        0    0    3-Oct-2011 17:31:32    3-Oct-2011
> 17:31:42
> (10sec)
> Cleanup    1    1        0    0    3-Oct-2011 17:31:44    3-Oct-2011
> 17:31:48 (4sec)
> ============================
>
>
> Analysis
> =========
>
> Time taken by best performing map task task_201110031726_0001_m_000000:
> 4sec
> Average time taken by map tasks: 4sec
> Worse performing map tasks:
> TaskId        Timetaken
> task_201110031726_0001_m_000000 4sec
> The last map task task_201110031726_0001_m_000000 finished at (relative to
> the Job launch time): 3-Oct-2011 17:31:30 (12sec)
>
> Time taken by best performing shuffle task task_201110031726_0001_r_000000:
> 7sec
> Average time taken by shuffle tasks: 7sec
> Worse performing shuffle tasks:
> TaskId        Timetaken
> task_201110031726_0001_r_000000 7sec
> The last shuffle task task_201110031726_0001_r_000000 finished at (relative
> to the Job launch time): 3-Oct-2011 17:31:39 (21sec)
>
> Time taken by best performing reduce task task_201110031726_0001_r_000000:
> 2sec
> Average time taken by reduce tasks: 2sec
> Worse performing reduce tasks:
> TaskId        Timetaken
> task_201110031726_0001_r_000000 2sec
> The last reduce task task_201110031726_0001_r_000000 finished at (relative
> to the Job launch time): 3-Oct-2011 17:31:42 (24sec)
> =========
>
> FAILED SETUP task list for 0001_1317688277686_vgogate
> TaskId        StartTime    FinishTime    Error
> ====================================================
>
> KILLED SETUP task list for 0001_1317688277686_vgogate
> TaskId        StartTime    FinishTime    Error
> ====================================================
>
> FAILED MAP task list for 0001_1317688277686_vgogate
> TaskId        StartTime    FinishTime    Error    InputSplits
> ====================================================
>
> KILLED MAP task list for 0001_1317688277686_vgogate
> TaskId        StartTime    FinishTime    Error    InputSplits
> ====================================================
>
> FAILED REDUCE task list for 0001_1317688277686_vgogate
> TaskId        StartTime    FinishTime    Error
> ====================================================
>
> KILLED REDUCE task list for 0001_1317688277686_vgogate
> TaskId        StartTime    FinishTime    Error
> ====================================================
>
> FAILED CLEANUP task list for 0001_1317688277686_vgogate
> TaskId        StartTime    FinishTime    Error
> ====================================================
>
> KILLED CLEANUP task list for 0001_1317688277686_vgogate
> TaskId        StartTime    FinishTime    Error
> ====================================================
>
> ======================================
>
> On Mon, Oct 3, 2011 at 1:53 PM, patrick sang <silvianhadoop@gmail.com
> >wrote:
>
> > Hi Hadoopers,
> >
> > I am writing script to detect if there is any running job has been
> running
> > longer than X hours.
> >
> > So far,  I use
> > ./hadoop job -jt jobtracker:port -list all |awk '{ if($2==1) print $1 }'
> > ---> to get the list of running JobID
> >
> > I am finding the way to get how long the job has been running from jobId.
> >
> > In the web admin page (http://job:50030), we can easily see the duration
> > of
> > each jobId pretty easily
> > from *Started at:, ** Running for:*. of each running job.
> >
> > How do we get the such an information from command line?
> >
> > hope this make sense.
> > Thanks you in advance,
> >
> > -P
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message