hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravi Gummadi <gr...@yahoo-inc.com>
Subject Re: "Map input bytes" vs HDFS_BYTES_READ
Date Fri, 04 Feb 2011 04:14:18 GMT
Ted Yu wrote:
> From my limited experiment, I think "Map input bytes" reflects the number of
> bytes of local data file(s) when LocalJobRunner is used.
>
> Correct me if I am wrong.
>   
This is correct only if there is a single spill (and not multiple 
spills) i.e. all the map output fits in io.sort.mb.

-Ravi
> On Tue, Feb 1, 2011 at 7:52 PM, Harsh J <qwertymaniac@gmail.com> wrote:
>
>   
>> Each task counts independently of its attempt/other tasks, thereby
>> making the aggregates easier to control. Final counters are aggregated
>> only from successfully committed tasks. During the job's run, however,
>> counters are shown aggregated from the most successful attempts of a
>> task thus far.
>>
>> On Wed, Feb 2, 2011 at 9:09 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>>     
>>> If map task(s) were retried (mapred.map.max.attempts times), how would
>>>       
>> these
>>     
>>> two counters be affected ?
>>>
>>> Thanks
>>>
>>> On Tue, Feb 1, 2011 at 7:31 PM, Harsh J <qwertymaniac@gmail.com> wrote:
>>>
>>>       
>>>> HDFS_BYTES_READ is a FileSystem interface counter. It directly deals
>>>> with the FS read (lower level). Map input bytes is what the
>>>> RecordReader has processed in number of bytes for records being read
>>>> from the input stream.
>>>>
>>>> For plain text files, I believe both counters must report about the
>>>> same value, were entire records being read with no operation performed
>>>> on each line. But when you throw in a compressed file, you'll notice
>>>> that the HDFS_BYTES_READ would be far lesser than Map input bytes
>>>> since the disk read was low, but the total content stored in record
>>>> terms was still the same as it would be for an uncompressed file.
>>>>
>>>> Hope this clears it.
>>>>
>>>> On Wed, Feb 2, 2011 at 8:06 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>>>>         
>>>>> In hadoop 0.20.2, what's the relationship between "Map input bytes"
>>>>>           
>> and
>>     
>>>>> HDFS_BYTES_READ ?
>>>>>
>>>>> <counter group="FileSystemCounters"
>>>>> name="HDFS_BYTES_READ">203446204073</counter>
>>>>> <counter group="FileSystemCounters"
>>>>> name="HDFS_BYTES_WRITTEN">23413127561</counter>
>>>>> <counter group="Map-Reduce Framework" name="Map input
>>>>> records">163502600</counter>
>>>>> <counter group="Map-Reduce Framework" name="Spilled
>>>>>           
>> Records">0</counter>
>>     
>>>>> <counter group="Map-Reduce Framework" name="Map input
>>>>> bytes">965922136488</counter>
>>>>> <counter group="Map-Reduce Framework" name="Map output
>>>>> records">296754600</counter>
>>>>>
>>>>> Thanks
>>>>>
>>>>>           
>>>>
>>>> --
>>>> Harsh J
>>>> www.harshj.com
>>>>
>>>>         
>>
>> --
>> Harsh J
>> www.harshj.com
>>
>>     


Mime
View raw message