hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "hailong.yang1115" <hailong.yang1...@gmail.com>
Subject Re: Re: Fw: Problems about the job counters
Date Wed, 29 Jun 2011 07:19:55 GMT
Hi Denny,

Thank you very much for your reply. I think you explained the problem quite clearly. I also
read your blog and the articles about Hadoop mechanism are very insightful.


Cheers!

Hailong

2011-06-29 



***********************************************
* Hailong Yang, PhD. Candidate 
* Sino-German Joint Software Institute, 
* School of Computer Science&Engineering, Beihang University
* Phone: (86-010)82315908
* Email: hailong.yang1115@gmail.com
* Address: G413, New Main Building in Beihang University, 
*              No.37 XueYuan Road,HaiDian District, 
*              Beijing,P.R.China,100191
***********************************************



发件人: Denny Ye 
发送时间: 2011-06-29  15:03:36 
收件人: hailong.yang1115; hdfs-user 
抄送: 
主题: Re: Fw: Problems about the job counters 
 
hi Hailong
  
       An important phase between map and reduce task is 'Shuffle'. In map task part, all
the output records filled to in-memory buffer and then spill to disk as local spill file (temporary
file), if the map task took huge amount output and several local spill file. It necessary
to 'merge' those spill files to single target file in map side.
So, map task read local spill file content from disk to memory and going to merging records
to disk again(FILE_BYTES_READ in map side means the merge phase between spill files in disk
and memory, also FILE_BYTES_WITTERN is total bytes that spilled to disk). 


      HDFS_BYTES_READ only represents the map input bytes from HDFS.


     Referenced blogs of mine to explains 'Shuffle' phase in Chinese.
     http://langyu.iteye.com/blog/992916


--Regards
Denny Ye


2011/6/15 hailong.yang1115 <hailong.yang1115@gmail.com>


Sorry for sending this email again but I got no answers from the first one. Anyone please
help or forward it to mail-list that would help.

2011-06-15



***********************************************
* Hailong Yang, PhD. Candidate 
* Sino-German Joint Software Institute, 
* School of Computer Science&Engineering, Beihang University
* Phone: (86-010)82315908
* Email: hailong.yang1115@gmail.com
* Address: G413, New Main Building in Beihang University, 
*              No.37 XueYuan Road,HaiDian District, 
*              Beijing,P.R.China,100191
***********************************************



发件人: hailong.yang1115
发送时间: 2011-06-10 13:28:46
收件人: general
抄送: 
主题: Problems about the job counters

Dear all,

I am trying to the built-in example wordcount with nearly 15GB input. When the Hadoop job
finished, I got the following counters.


CounterMapReduceTotal
Job CountersLaunched reduce tasks001
Rack-local map tasks0035
Launched map tasks002,318
Data-local map tasks002,283
FileSystemCountersFILE_BYTES_READ22,863,580,65617,654,943,34140,518,523,997
HDFS_BYTES_READ154,400,997,4590154,400,997,459
FILE_BYTES_WRITTEN33,490,829,40317,654,943,34151,145,772,744
HDFS_BYTES_WRITTEN02,747,356,7042,747,356,704


My question is what does the FILE_BYTES_READ counter mean? And what is the difference between
FILE_BYTES_READ and HDFS_BYTES_READ? In my opinion, all the input is located in HDFS, so where
does FILE_BYTES_READ come from during the map phase?


Any help will be appreciated!

Hailong

2011-06-10 



***********************************************
* Hailong Yang, PhD. Candidate 
* Sino-German Joint Software Institute, 
* School of Computer Science&Engineering, Beihang University
* Phone: (86-010)82315908
* Email: hailong.yang1115@gmail.com
* Address: G413, New Main Building in Beihang University, 
*              No.37 XueYuan Road,HaiDian District, 
*              Beijing,P.R.China,100191
***********************************************
Mime
View raw message