hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rares Vernica <rvern...@gmail.com>
Subject possible bug in updating Counter.DATA_LOCAL_MAPS
Date Tue, 10 Aug 2010 21:58:08 GMT

 I set "mapred.task.cache.levels" to 1 so that I have only
data-local-map tasks. Still, by looking the the data-local-maps
counter it seems not all map tasks are local. I checked each map task
to see where it run and what split has been assigned to it and all the
maps were actually processing only local data. (BTW, replication was
set to 1.)

I looked into the JobClient so see what information is there for each
split. For each file, the first n-1 splits have an IP address as
location while the n-th split has a host name as location. The reason
for this is that there is a different code path in deciding the
location for the first n-1 splits versus the n-th split. The maps that
processed the splits where the location was a host name were counted
as data-local-maps while the others were not.

So, regardless of the fact that the JobClient gives IP or host names
for splits the job works fine. The problem is that the data-local-maps
counter does not take this into consideration.


View raw message