hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-2135) Need a counter for map task output file size
Date Tue, 26 Oct 2010 07:27:21 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924873#action_12924873
] 

Amar Kamat commented on MAPREDUCE-2135:
---------------------------------------

Ravi,
Shouldn't _'Combine output records'_ counter value correspond to _HDFS_BYTES_WRITTEN_ counter
value (num-reducers == 0) or _FILE_BYTES_WRITTEN_  counter value (num-reducers > 0). So
from your example, 210 bytes written via 8 records. Can you kindly check if the spill logic
directly uses _File_ objects instead of using _LocalFileSystem_? If the spill logic uses _File_
objects, then multiple spills shouldn't affect the final bytes written, right?

> Need a counter for map task output file size
> --------------------------------------------
>
>                 Key: MAPREDUCE-2135
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2135
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>            Reporter: Ravi Gummadi
>
> With MapReduce trunk,
>  The FileSystem counter FILE_BYTES_WRITTEN is a lot less than "Map output bytes" counter
even when map output compression is OFF. I think this FILE_BYTES_WRITTEN signifies the bytes
written to local file system. So it should be more than map output bytes(in the counters shown
below, 210 Vs 19200000). Right ?
> Here are some counters from map task of wordcount example:
> Counters for attempt_201010141448_0001_m_000000_0
> FileInputFormatCounters
> 	BYTES_READ 	9,600,000
> FileSystemCounters
> 	FILE_BYTES_READ 	92
> 	FILE_BYTES_WRITTEN 	210
> 	HDFS_BYTES_READ 	9,600,107
> Map-Reduce Framework
> 	Combine input records 	2,400,000
> 	Combine output records 	8
> 	CPU_MILLISECONDS 	4,810
> 	Failed Shuffles 	0
> 	GC time elapsed (ms) 	73
> 	Map input records 	600,000
> 	Map output bytes 	19,200,000
> 	Map output records 	2,400,000
> 	Merged Map outputs 	0
> 	PHYSICAL_MEMORY_BYTES 	131,518,464
> 	Spilled Records 	16
> 	SPLIT_RAW_BYTES 	107
> 	VIRTUAL_MEMORY_BYTES 	581,021,696

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message