hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sonal Goyal <sonalgoy...@gmail.com>
Subject Re: Info required regarding JobTracker Job Details/Metrics
Date Thu, 23 Aug 2012 13:20:56 GMT
Dont the completed job metrics in the job tracker/or bin/hadoop job
-history provide you the information you seek?

Best Regards,
Sonal
Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>





On Thu, Aug 23, 2012 at 5:36 PM, Gaurav Dasgupta <gdsayshi@gmail.com> wrote:

> Hi,
>
> Thanks for your replies.
> Any idea how do I calculate the "total shuffle time"?
> I can get and calculate the total time taken by all the Mappers and all
> the Reducers separatey, but the intermediate shuffle/sort time is absent.
> Any clue?
>
> Thanks,
> Gaurav Dasgupta
>
>
> On Thu, Aug 23, 2012 at 5:26 PM, Sonal Goyal <sonalgoyal4@gmail.com>wrote:
>
>> Gaurav,
>>
>> You can also refer to Tom White's Hadoop, The Definitive Guide, Chapter 8
>> which has a reference to each of the job counters. I believe the Apache
>> site also had a page detailing the counters, but I cant seem to locate it.
>>
>> Best Regards,
>> Sonal
>> Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
>> Nube Technologies <http://www.nubetech.co/>
>>
>> <http://in.linkedin.com/in/sonalgoyal>
>>
>>
>>
>>
>>
>>
>> On Thu, Aug 23, 2012 at 5:20 PM, Bejoy Ks <bejoy.hadoop@gmail.com> wrote:
>>
>>> Hi Gaurav
>>>
>>> If it is just a simple word count example.
>>> Map input size =  HDFS_BYTES_READ
>>> Reduce Output Size =  HDFS_BYTES_WRITTEN
>>> Reduce Input Size should be Map output bytes
>>>
>>> File Bytes Written is what the job is writing into local file system.
>>> AFAIK it is map task's intermediate output written to LFS.
>>>
>>>
>>> Regrads
>>> Bejoy KS
>>>
>>>
>>> On Thu, Aug 23, 2012 at 4:54 PM, Gaurav Dasgupta <gdsayshi@gmail.com>wrote:
>>>
>>>> Sorry, the correct outcomes are for the single wordcount job are:
>>>>
>>>> 12/08/23 04:31:22 INFO mapred.JobClient: Job complete:
>>>> job_201208230144_0002
>>>> 12/08/23 04:31:22 INFO mapred.JobClient: Counters: 26
>>>> 12/08/23 04:31:22 INFO mapred.JobClient:   Job Counters
>>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Launched reduce tasks=64
>>>> 12/08/23 04:31:22 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=103718235
>>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Total time spent by all
>>>> reduces waiting after reserving slots (ms)=0
>>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Total time spent by all
>>>> maps waiting after reserving slots (ms)=0
>>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Launched map tasks=3060
>>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Data-local map tasks=3060
>>>> 12/08/23 04:31:22 INFO mapred.JobClient:
>>>> SLOTS_MILLIS_REDUCES=9208855
>>>> 12/08/23 04:31:22 INFO mapred.JobClient:   FileSystemCounters
>>>> 12/08/23 04:31:22 INFO mapred.JobClient:     FILE_BYTES_READ=58263069209
>>>> 12/08/23 04:31:22 INFO mapred.JobClient:
>>>> HDFS_BYTES_READ=394195953674
>>>> 12/08/23 04:31:22 INFO mapred.JobClient:
>>>> FILE_BYTES_WRITTEN=2046757548
>>>> 12/08/23 04:31:22 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=28095
>>>> 12/08/23 04:31:22 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Map input records=586006142
>>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Reduce shuffle
>>>> bytes=53567298
>>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Spilled Records=108996063
>>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Map output
>>>> bytes=468042247685
>>>> 12/08/23 04:31:22 INFO mapred.JobClient:     CPU time spent
>>>> (ms)=91162220
>>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Total committed heap usage
>>>> (bytes)=981605744640
>>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Combine input
>>>> records=32046224559
>>>> 12/08/23 04:31:22 INFO mapred.JobClient:     SPLIT_RAW_BYTES=382500
>>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Reduce input records=96063
>>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Reduce input groups=1000
>>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Combine output
>>>> records=108902950
>>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Physical memory (bytes)
>>>> snapshot=1147705057280
>>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Reduce output records=1000
>>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Virtual memory (bytes)
>>>> snapshot=3221902118912
>>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Map output
>>>> records=31937417672
>>>>
>>>>
>>>> Thanks,
>>>> Gaurav Dasgupta
>>>>  On Thu, Aug 23, 2012 at 4:28 PM, Gaurav Dasgupta <gdsayshi@gmail.com>wrote:
>>>>
>>>>> Hi Users,
>>>>>
>>>>> I have run a wordount job on a Hadoop 0.20 cluster and the JobTracker
>>>>> Web UI gave me the following information after the successful completion
of
>>>>> the job:
>>>>>
>>>>> *Job Counters*
>>>>> SLOTS_MILLIS_MAPS=5739
>>>>> Total time spent by all reduces waiting after reserving slots (ms)=0
>>>>> Total time spent by all maps waiting after reserving slots (ms)=0
>>>>> Launched map tasks=2
>>>>> SLOTS_MILLIS_REDUCES=0
>>>>> **
>>>>> *FileSystemCounters*
>>>>> HDFS_BYTES_READ=158
>>>>> FILE_BYTES_WRITTEN=97422
>>>>> HDFS_BYTES_WRITTEN=10000
>>>>> *Map-Reduce Framework*
>>>>> Map input records=586006142
>>>>> Reduce shuffle bytes=53567298
>>>>> Spilled Records=108996063
>>>>> Map output bytes=468042247685
>>>>> CPU time spent (ms)=91162220
>>>>> Total committed heap usage (bytes)=981605744640
>>>>> Combine input records=32046224559
>>>>> SPLIT_RAW_BYTES=382500
>>>>> Reduce input records=96063
>>>>> Reduce input groups=1000
>>>>> Combine output records=108902950
>>>>> Physical memory (bytes) snapshot=1147705057280
>>>>> Reduce output records=1000
>>>>> Virtual memory (bytes) snapshot=3221902118912
>>>>> Map output records=31937417672
>>>>>
>>>>> Can some one explain me all these above metrics? I mainly want to
>>>>> know the "total shuffled bytes" of the jobs. Is is "Reduce shuffle bytes"?
>>>>> Also, how can I calculate the "total shuffle time taken"?
>>>>> Also, which of the above are the "Map Input Size", "Reduce Input
>>>>> Size" and "Reduce Output Size"?
>>>>> I also want to know what is the difference between "FILE_BYTES_WRITTEN
>>>>> and HDFS_BYTES_WRITTEN. What is it writing outside HDFS which is bigger
in
>>>>> size than HDFS?
>>>>>
>>>>> Regards,
>>>>> Gaurav Dasgupta
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message