hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy Li" <annndy....@gmail.com>
Subject Re: Map and Reduce numbers are not restricted by setNumMapTasks and setNumReduceTasks, JobConf related?
Date Tue, 07 Oct 2008 05:12:46 GMT
Thanks Samuel.  I have tried to look for answers and some try-and-error in
my own program.

The only way I know so far to enforce the Mapper is to assign each file to
one Mapper by your own customized InputFormat and RecordReader and override
isSplittable() to always return false.

In another mail threat, some one posted it.  I found this link:
http://www.nabble.com/1-file-per-record-td19644985.html
which shows you how to prevent the file from splitting.
In Hadoop Wiki FAQ 10, it also shows you how to assign one file to one
Mapper in several ways.

I think there should be some documents where it indicates that the
"setNumMapTaks" and "setNumReduceTaks"
will be override by the splitting.  It was misleading at the first time when
I use them.  I expect that the files will be
split into blocks/chunks and each block/chunk will be assign to a Mapper.
The maximum Mapper count will be controlled
by the number specified in "setNumMapTaks" and "setNumReduceTaks".

Unfortunately, this was not the exact expectation base on the method name.
=(
Anyone know if this is the correct answer to this problem? or they are
actually 2 different things?

Thanks,
-Andy

On Mon, Oct 6, 2008 at 7:02 PM, Samuel Guo <guosijie@gmail.com> wrote:

> Mapper's Number depends on your inputformat.
> Default Inputformat try to treat every file block of a file as a
> InputSplit.
> And you will get the same number of mappers as the number of your
> inputsplits.
> try to configure "mapred.min.split.size" to reduce the number of your
> mapper
> if you want to.
>
> And I don't know why your reducer is just one. Anyone knows?
>
> On Tue, Oct 7, 2008 at 9:06 AM, Andy Li <annndy.lee@gmail.com> wrote:
>
> > Dears,
> >
> > Sorry, I did not mean to cross post.  But the previous article was
> > accidentally posted to the HBase user list.  I would like to bring it
> back
> > to the Hadoop user since it is confusing me a lot and it is mainly
> > MapReduce
> > related.
> >
> > Currently running version hadoop-0.18.1 on 25 nodes.  Map and Reduce Task
> > Capacity is 92.  When I do this in my MapReduce program:
> >
> > ============= SAMPLE CODE =============
> >        JobConf jconf = new JobConf(conf, TestTask.class);
> >        jconf.setJobName("my.test.TestTask");
> >        jconf.setOutputKeyClass(Text.class);
> >        jconf.setOutputValueClass(Text.class);
> >        jconf.setOutputFormat(TextOutputFormat.class);
> >        jconf.setMapperClass(MyMapper.class);
> >        jconf.setCombinerClass(MyReducer.class);
> >        jconf.setReducerClass(MyReducer.class);
> >        jconf.setInputFormat(TextInputFormat.class);
> >        try {
> >            jconf.setNumMapTasks(5);
> >            jconf.setNumReduceTasks(3);
> >            JobClient.runJob(jconf);
> >        } catch (Exception e) {
> >            e.printStackTrace();
> >        }
> > ============= ============= =============
> >
> > When I run the job, I'm always getting 300 mappers and 1 reducers from
> the
> > JobTracker webpage running on the default port 50030.
> > No matter how I configure the numbers in methods "setNumMapTasks" and
> > "setNumReduceTasks", I get the same result.
> > Does anyone know why this is happening?
> > Am I missing something or misunderstand something in the picture?  =(
> >
> > Here's a reference to the parameters we have override in
> "hadoop-site.xml".
> > ===============
> > <property>
> >  <name>mapred.tasktracker.map.tasks.maximum</name>
> >  <value>4</value>
> > </property>
> >
> > <property>
> >  <name>mapred.tasktracker.reduce.tasks.maximum</name>
> >  <value>4</value>
> > </property>
> > ================
> > other parameters are default from hadoop-default.xml.
> >
> > Any idea how this is happening?
> >
> > Any inputs are appreciated.
> >
> > Thanks,
> > -Andy
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message