hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From R V <cattiv...@yahoo.com>
Subject Re: File System Counters.
Date Thu, 28 Jul 2011 21:49:20 GMT
Harsh
 
If this is the case I don't understand something. If I see FILE_BYTES_READ to be non zero
for a map, the only thing I can assume is that it came
 from a spill during sort phase.
 
I have a 10 node cluster, and I ran TeraSort with size 100,000 Bytes ( 1000 records). 
 
My io.sort.mb is 300 and io.sort.factor is 10. My mapred.child.java.opts is set to -Xmx512m.
 
When I run this I expected given that I have everything that fits into memory,  that there
will be no FILE_BYTES_READ on the map side and no FILE_BYTES_WRITTEN on the redcue side. But
I find that my 
FILE_BYTES_READ on the map side is 188,604 (HDFS_BYTES_READ is 149,686) and inexplicably SPILLED_RECORDS
is 1000 for both and map and reduce. 
 
So my questions have become two.
1. Why is my spill count 1000. Given that io.sort.factor and io.sort.mb are 10 and 300 MB
and I have 512MB for each task?
2.  Where are the numbers for FILE_BYTES_READ/WRITTEN coming from?
 
TIA
 
Raj
From: Harsh J <harsh@cloudera.com>
To: common-user@hadoop.apache.org; R V <cattivo23@yahoo.com>
Sent: Thursday, July 28, 2011 12:03 AM
Subject: Re: File System Counters.

Raj,

There is no overlap. Data read from HDFS FileSystem instances go to
HDFS_BYTES_READ, and data read from Local FileSystem instances go to
FILE_BYTES_READ. These are two different FileSystems, and have no
overlap at all.

On Thu, Jul 28, 2011 at 5:56 AM, R V <cattivo23@yahoo.com> wrote:
> Hello
>
> I don't know if the question has been answered. I  am trying to understand the overlap
between FILE_BYTES_READ and HDFS_BYTES_READ. What are the various components that provide
value to this counter? For example when I see FILE_BYTES_READ for a specific task ( Map or
Reduce ) , is it purely due to the spill during sort phase? If a HDFS read happens on a non
local node, does the counter increase on the node where the data block resides? What happens
when the data is local? does the counter increase for both HDFS_BYTES_READ and FILE_BYTES_READ?
From the values I am seeing, this looks to be the case but I am not sure.
>
> I am not very fluent in Java , and hence I don't fully understand the source . :-(
>
> Raj



-- 
Harsh J

Mime
View raw message