hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: mapred.map.tasks getting set, but not sure where
Date Fri, 04 Nov 2011 15:29:48 GMT
In 0.20.2 The JobClient will update mapred.map.tasks to be equal to the number of splits returned
by the InputFormat.  The input format will usually take mapred.map.tasks as a recommendation
when deciding on what splits to make.  That is the only place in the code that I could find
that is setting the value and could have any impact on the number of mappers launched.  It
could be that Someone changed the number of files that are being read in as input, or that
the block size of the files being read in is now different.  It could also be that someone
started compressing the input files, so now they can not be split.  If the number of mappers
is different it probably means that the input is different some how.

--Bobby Evans

On 11/4/11 10:12 AM, "Brendan W." <bw8408@gmail.com> wrote:

All the same, no change in that...0.20.2.

Other people do have access to this system to change things like conf
files, but nobody's owning up and I have to figure this out.  I have
verified that the mapred.map.tasks property is not getting set in the
mapred-site.xml files on the cluster or in the job.  Just out of other
ideas about where it might be getting set...



On Fri, Nov 4, 2011 at 11:04 AM, Robert Evans <evans@yahoo-inc.com> wrote:

> What versions of Hadoop were you running with previously, and what version
> are you running with now?
> --Bobby Evans
> On 11/4/11 9:33 AM, "Brendan W." <bw8408@gmail.com> wrote:
> Hi,
> In the jobs running on my cluster of 20 machines, I used to run jobs (via
> "hadoop jar ...") that would spawn around 4000 map tasks.  Now when I run
> the same jobs, that number is 20; and I notice that in the job
> configuration, the parameter mapred.map.tasks is set to 20, whereas it
> never used to be present at all in the configuration file.
> Changing the input split size in the job doesn't affect this--I get the
> size split I ask for, but the *number* of input splits is still capped at
> 20--i.e., the job isn't reading all of my data.
> The mystery to me is where this parameter could be getting set.  It is not
> present in the mapred-site.xml file in <hadoop home>/conf on any machine in
> the cluster, and it is not being set in the job (I'm running out of the
> same jar I always did; no updates).
> Is there *anywhere* else this parameter could possibly be getting set?
> I've stopped and restarted map-reduce on the cluster with no effect...it's
> getting re-read in from somewhere, but I can't figure out where.
> Thanks a lot,
> Brendan

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message