hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: "Map input bytes" vs HDFS_BYTES_READ
Date Wed, 02 Feb 2011 03:31:30 GMT
HDFS_BYTES_READ is a FileSystem interface counter. It directly deals
with the FS read (lower level). Map input bytes is what the
RecordReader has processed in number of bytes for records being read
from the input stream.

For plain text files, I believe both counters must report about the
same value, were entire records being read with no operation performed
on each line. But when you throw in a compressed file, you'll notice
that the HDFS_BYTES_READ would be far lesser than Map input bytes
since the disk read was low, but the total content stored in record
terms was still the same as it would be for an uncompressed file.

Hope this clears it.

On Wed, Feb 2, 2011 at 8:06 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> In hadoop 0.20.2, what's the relationship between "Map input bytes" and
> <counter group="FileSystemCounters"
> name="HDFS_BYTES_READ">203446204073</counter>
> <counter group="FileSystemCounters"
> name="HDFS_BYTES_WRITTEN">23413127561</counter>
> <counter group="Map-Reduce Framework" name="Map input
> records">163502600</counter>
> <counter group="Map-Reduce Framework" name="Spilled Records">0</counter>
> <counter group="Map-Reduce Framework" name="Map input
> bytes">965922136488</counter>
> <counter group="Map-Reduce Framework" name="Map output
> records">296754600</counter>
> Thanks

Harsh J

View raw message