hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@yahoo-inc.com>
Subject Re: possible bug in updating Counter.DATA_LOCAL_MAPS
Date Wed, 11 Aug 2010 15:41:47 GMT
Rares,

  This sounds like a  good bug to fix - can you please open a jira?

thanks,
Arun

On Aug 10, 2010, at 2:58 PM, Rares Vernica wrote:

> Hello,
>
> I set "mapred.task.cache.levels" to 1 so that I have only
> data-local-map tasks. Still, by looking the the data-local-maps
> counter it seems not all map tasks are local. I checked each map task
> to see where it run and what split has been assigned to it and all the
> maps were actually processing only local data. (BTW, replication was
> set to 1.)
>
> I looked into the JobClient so see what information is there for each
> split. For each file, the first n-1 splits have an IP address as
> location while the n-th split has a host name as location. The reason
> for this is that there is a different code path in deciding the
> location for the first n-1 splits versus the n-th split. The maps that
> processed the splits where the location was a host name were counted
> as data-local-maps while the others were not.
>
> So, regardless of the fact that the JobClient gives IP or host names
> for splits the job works fine. The problem is that the data-local-maps
> counter does not take this into consideration.
>
> Cheers,
> Rares


Mime
View raw message