hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gaurav Dasgupta <gdsay...@gmail.com>
Subject Re: Info required regarding JobTracker Job Details/Metrics
Date Thu, 23 Aug 2012 12:06:10 GMT
Hi,

Thanks for your replies.
Any idea how do I calculate the "total shuffle time"?
I can get and calculate the total time taken by all the Mappers and all the
Reducers separatey, but the intermediate shuffle/sort time is absent. Any
clue?

Thanks,
Gaurav Dasgupta


On Thu, Aug 23, 2012 at 5:26 PM, Sonal Goyal <sonalgoyal4@gmail.com> wrote:

> Gaurav,
>
> You can also refer to Tom White's Hadoop, The Definitive Guide, Chapter 8
> which has a reference to each of the job counters. I believe the Apache
> site also had a page detailing the counters, but I cant seem to locate it.
>
> Best Regards,
> Sonal
> Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
> Nube Technologies <http://www.nubetech.co/>
>
> <http://in.linkedin.com/in/sonalgoyal>
>
>
>
>
>
>
> On Thu, Aug 23, 2012 at 5:20 PM, Bejoy Ks <bejoy.hadoop@gmail.com> wrote:
>
>> Hi Gaurav
>>
>> If it is just a simple word count example.
>> Map input size =  HDFS_BYTES_READ
>> Reduce Output Size =  HDFS_BYTES_WRITTEN
>> Reduce Input Size should be Map output bytes
>>
>> File Bytes Written is what the job is writing into local file system.
>> AFAIK it is map task's intermediate output written to LFS.
>>
>>
>> Regrads
>> Bejoy KS
>>
>>
>> On Thu, Aug 23, 2012 at 4:54 PM, Gaurav Dasgupta <gdsayshi@gmail.com>wrote:
>>
>>> Sorry, the correct outcomes are for the single wordcount job are:
>>>
>>> 12/08/23 04:31:22 INFO mapred.JobClient: Job complete:
>>> job_201208230144_0002
>>> 12/08/23 04:31:22 INFO mapred.JobClient: Counters: 26
>>> 12/08/23 04:31:22 INFO mapred.JobClient:   Job Counters
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Launched reduce tasks=64
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=103718235
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Total time spent by all
>>> reduces waiting after reserving slots (ms)=0
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Total time spent by all
>>> maps waiting after reserving slots (ms)=0
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Launched map tasks=3060
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Data-local map tasks=3060
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=9208855
>>> 12/08/23 04:31:22 INFO mapred.JobClient:   FileSystemCounters
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     FILE_BYTES_READ=58263069209
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     HDFS_BYTES_READ=394195953674
>>> 12/08/23 04:31:22 INFO mapred.JobClient:
>>> FILE_BYTES_WRITTEN=2046757548
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=28095
>>> 12/08/23 04:31:22 INFO mapred.JobClient:   Map-Reduce Framework
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Map input records=586006142
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Reduce shuffle
>>> bytes=53567298
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Spilled Records=108996063
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Map output
>>> bytes=468042247685
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     CPU time spent (ms)=91162220
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Total committed heap usage
>>> (bytes)=981605744640
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Combine input
>>> records=32046224559
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     SPLIT_RAW_BYTES=382500
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Reduce input records=96063
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Reduce input groups=1000
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Combine output
>>> records=108902950
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Physical memory (bytes)
>>> snapshot=1147705057280
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Reduce output records=1000
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Virtual memory (bytes)
>>> snapshot=3221902118912
>>> 12/08/23 04:31:22 INFO mapred.JobClient:     Map output
>>> records=31937417672
>>>
>>>
>>> Thanks,
>>> Gaurav Dasgupta
>>>  On Thu, Aug 23, 2012 at 4:28 PM, Gaurav Dasgupta <gdsayshi@gmail.com>wrote:
>>>
>>>> Hi Users,
>>>>
>>>> I have run a wordount job on a Hadoop 0.20 cluster and the JobTracker
>>>> Web UI gave me the following information after the successful completion
of
>>>> the job:
>>>>
>>>> *Job Counters*
>>>> SLOTS_MILLIS_MAPS=5739
>>>> Total time spent by all reduces waiting after reserving slots (ms)=0
>>>> Total time spent by all maps waiting after reserving slots (ms)=0
>>>> Launched map tasks=2
>>>> SLOTS_MILLIS_REDUCES=0
>>>> **
>>>> *FileSystemCounters*
>>>> HDFS_BYTES_READ=158
>>>> FILE_BYTES_WRITTEN=97422
>>>> HDFS_BYTES_WRITTEN=10000
>>>> *Map-Reduce Framework*
>>>> Map input records=586006142
>>>> Reduce shuffle bytes=53567298
>>>> Spilled Records=108996063
>>>> Map output bytes=468042247685
>>>> CPU time spent (ms)=91162220
>>>> Total committed heap usage (bytes)=981605744640
>>>> Combine input records=32046224559
>>>> SPLIT_RAW_BYTES=382500
>>>> Reduce input records=96063
>>>> Reduce input groups=1000
>>>> Combine output records=108902950
>>>> Physical memory (bytes) snapshot=1147705057280
>>>> Reduce output records=1000
>>>> Virtual memory (bytes) snapshot=3221902118912
>>>> Map output records=31937417672
>>>>
>>>> Can some one explain me all these above metrics? I mainly want to
>>>> know the "total shuffled bytes" of the jobs. Is is "Reduce shuffle bytes"?
>>>> Also, how can I calculate the "total shuffle time taken"?
>>>> Also, which of the above are the "Map Input Size", "Reduce Input
>>>> Size" and "Reduce Output Size"?
>>>> I also want to know what is the difference between "FILE_BYTES_WRITTEN
>>>> and HDFS_BYTES_WRITTEN. What is it writing outside HDFS which is bigger in
>>>> size than HDFS?
>>>>
>>>> Regards,
>>>> Gaurav Dasgupta
>>>>
>>>
>>>
>>
>

Mime
View raw message