hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bejoy Ks <bejoy.had...@gmail.com>
Subject Re: Info required regarding JobTracker Job Details/Metrics
Date Thu, 23 Aug 2012 11:50:34 GMT
Hi Gaurav

If it is just a simple word count example.
Map input size =  HDFS_BYTES_READ
Reduce Output Size =  HDFS_BYTES_WRITTEN
Reduce Input Size should be Map output bytes

File Bytes Written is what the job is writing into local file system. AFAIK
it is map task's intermediate output written to LFS.


Regrads
Bejoy KS

On Thu, Aug 23, 2012 at 4:54 PM, Gaurav Dasgupta <gdsayshi@gmail.com> wrote:

> Sorry, the correct outcomes are for the single wordcount job are:
>
> 12/08/23 04:31:22 INFO mapred.JobClient: Job complete:
> job_201208230144_0002
> 12/08/23 04:31:22 INFO mapred.JobClient: Counters: 26
> 12/08/23 04:31:22 INFO mapred.JobClient:   Job Counters
> 12/08/23 04:31:22 INFO mapred.JobClient:     Launched reduce tasks=64
> 12/08/23 04:31:22 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=103718235
> 12/08/23 04:31:22 INFO mapred.JobClient:     Total time spent by all
> reduces waiting after reserving slots (ms)=0
> 12/08/23 04:31:22 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
> 12/08/23 04:31:22 INFO mapred.JobClient:     Launched map tasks=3060
> 12/08/23 04:31:22 INFO mapred.JobClient:     Data-local map tasks=3060
> 12/08/23 04:31:22 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=9208855
> 12/08/23 04:31:22 INFO mapred.JobClient:   FileSystemCounters
> 12/08/23 04:31:22 INFO mapred.JobClient:     FILE_BYTES_READ=58263069209
> 12/08/23 04:31:22 INFO mapred.JobClient:     HDFS_BYTES_READ=394195953674
> 12/08/23 04:31:22 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=2046757548
> 12/08/23 04:31:22 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=28095
> 12/08/23 04:31:22 INFO mapred.JobClient:   Map-Reduce Framework
> 12/08/23 04:31:22 INFO mapred.JobClient:     Map input records=586006142
> 12/08/23 04:31:22 INFO mapred.JobClient:     Reduce shuffle bytes=53567298
> 12/08/23 04:31:22 INFO mapred.JobClient:     Spilled Records=108996063
> 12/08/23 04:31:22 INFO mapred.JobClient:     Map output bytes=468042247685
> 12/08/23 04:31:22 INFO mapred.JobClient:     CPU time spent (ms)=91162220
> 12/08/23 04:31:22 INFO mapred.JobClient:     Total committed heap usage
> (bytes)=981605744640
> 12/08/23 04:31:22 INFO mapred.JobClient:     Combine input
> records=32046224559
> 12/08/23 04:31:22 INFO mapred.JobClient:     SPLIT_RAW_BYTES=382500
> 12/08/23 04:31:22 INFO mapred.JobClient:     Reduce input records=96063
> 12/08/23 04:31:22 INFO mapred.JobClient:     Reduce input groups=1000
> 12/08/23 04:31:22 INFO mapred.JobClient:     Combine output
> records=108902950
> 12/08/23 04:31:22 INFO mapred.JobClient:     Physical memory (bytes)
> snapshot=1147705057280
> 12/08/23 04:31:22 INFO mapred.JobClient:     Reduce output records=1000
> 12/08/23 04:31:22 INFO mapred.JobClient:     Virtual memory (bytes)
> snapshot=3221902118912
> 12/08/23 04:31:22 INFO mapred.JobClient:     Map output records=31937417672
>
>
> Thanks,
> Gaurav Dasgupta
> On Thu, Aug 23, 2012 at 4:28 PM, Gaurav Dasgupta <gdsayshi@gmail.com>wrote:
>
>> Hi Users,
>>
>> I have run a wordount job on a Hadoop 0.20 cluster and the JobTracker Web
>> UI gave me the following information after the successful completion of the
>> job:
>>
>> *Job Counters*
>> SLOTS_MILLIS_MAPS=5739
>> Total time spent by all reduces waiting after reserving slots (ms)=0
>> Total time spent by all maps waiting after reserving slots (ms)=0
>> Launched map tasks=2
>> SLOTS_MILLIS_REDUCES=0
>> **
>> *FileSystemCounters*
>> HDFS_BYTES_READ=158
>> FILE_BYTES_WRITTEN=97422
>> HDFS_BYTES_WRITTEN=10000
>> *Map-Reduce Framework*
>> Map input records=586006142
>> Reduce shuffle bytes=53567298
>> Spilled Records=108996063
>> Map output bytes=468042247685
>> CPU time spent (ms)=91162220
>> Total committed heap usage (bytes)=981605744640
>> Combine input records=32046224559
>> SPLIT_RAW_BYTES=382500
>> Reduce input records=96063
>> Reduce input groups=1000
>> Combine output records=108902950
>> Physical memory (bytes) snapshot=1147705057280
>> Reduce output records=1000
>> Virtual memory (bytes) snapshot=3221902118912
>> Map output records=31937417672
>>
>> Can some one explain me all these above metrics? I mainly want to
>> know the "total shuffled bytes" of the jobs. Is is "Reduce shuffle bytes"?
>> Also, how can I calculate the "total shuffle time taken"?
>> Also, which of the above are the "Map Input Size", "Reduce Input
>> Size" and "Reduce Output Size"?
>> I also want to know what is the difference between "FILE_BYTES_WRITTEN
>> and HDFS_BYTES_WRITTEN. What is it writing outside HDFS which is bigger in
>> size than HDFS?
>>
>> Regards,
>> Gaurav Dasgupta
>>
>
>

Mime
View raw message