hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Milind Bhandarkar (JIRA)" <j...@apache.org>
Subject [jira] Created: (MAPREDUCE-1922) Counters for data-local and rack-local tasks should be replaced by bytes-read-local and bytes-read-rack
Date Wed, 07 Jul 2010 02:49:52 GMT
Counters for data-local and rack-local tasks should be replaced by bytes-read-local and bytes-read-rack
-------------------------------------------------------------------------------------------------------

                 Key: MAPREDUCE-1922
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1922
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
         Environment: All
            Reporter: Milind Bhandarkar
            Assignee: Arun C Murthy


As more and more applications use combine file input format (to reduce number of mappers),
formats with columns groups implemented as different hdfs files (zebra, hbase), composite
input formats (map-side joins), data-locality and rack-locality loses its meaning. (A map
task reading only one column group, say 20% of its input, locally and 80% remote still gets
flagged as data-local map.)

So, my suggestion is to drop these counters, and instead, replace them with HDFS_LOCAL_BYTES_READ,
HDFS_RACK_BYTES_READ, and HDFS_TOTAL_BYTES_READ. These counters will make it easier to reason
about read-performance for maps.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message