hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Teryl Taylor <teryl.tay...@gmail.com>
Subject Re: Question about setting the number of mappers.
Date Tue, 19 Jan 2010 20:59:47 GMT
Hi Allen,

I have set them both to 10 in the mapred-site.xml file.

Cheers,

Teryl


On Tue, Jan 19, 2010 at 4:32 PM, Allen Wittenauer
<awittenauer@linkedin.com>wrote:

> What is the value of:
>
> mapred.tasktracker.map.tasks.maximum
> mapred.tasktracker.reduce.tasks.maximum
>
>
> On 1/19/10 10:23 AM, "Teryl Taylor" <teryl.taylor@gmail.com> wrote:
>
> > Hi guys,
> >
> > Thanks for the answers.   Michael, yes you are right, that is what I
> guess,
> > I'm looking for...how to reduce the number of mappers running
> > simultaneously.  The system is running really slow and I think it might
> be
> > due to constant thread context switching because of so many Mappers
> running
> > concurrently.   Is there a way to tell how many Mappers are running at
> the
> > same time?  My concern is that even though I set mapred.map.tasks to 10,
> > the job configuration file (i.e. the job.xml file that is generated to
> the
> > logs directory) always says mapred.map.tasks is 1083  which makes me
> believe
> > it is completely ignoring my setting.  This is confirmed by the snippet
> of
> > code from the JobClient.java file in which the client seems to ask for
> the
> > number of input splits and automatically sets mapred.map.tasks to that
> > number...totally ignoring my setting.
> >
> > Cheers,
> >
> > Teryl
> >
> >
> >
> > On Tue, Jan 19, 2010 at 12:14 PM, Clements, Michael <
> > Michael.Clements@disney.com> wrote:
> >
> >>  Do you want to change the total # of mappers, or the # that run at any
> >> given time? For example, Hadoop may split your job into 1083 mapper
> tasks,
> >> only 10 of which it allows to run at a time.
> >>
> >>
> >>
> >> The setting in mapred-site.xml caps how many mappers can run
> simultaneously
> >> per machine. It does not affect how many total mappers will make up the
> job.
> >>
> >>
> >>
> >> Total # of tasks in a job, and the # that can run simultaneously, are
> two
> >> separate settings. Both are important for tuning performance. The
> >> InputFormat controls the first.
> >>
> >>
> >>
> >>
> >>
> >> *From:* mapreduce-user-return-292-Michael.Clements=disney.com@
> >> hadoop.apache.org [mailto:mapreduce-user-return-292-Michael.Clements=
> >> disney.com@hadoop.apache.org] *On Behalf Of *Jeff Zhang
> >> *Sent:* Monday, January 18, 2010 4:54 PM
> >> *To:* mapreduce-user@hadoop.apache.org
> >> *Subject:* Re: Question about setting the number of mappers.
> >>
> >>
> >>
> >>
> >> Hi Teryl
> >>
> >> The number of mapper is determined by the InputFormat you use, in your
> >> case, one way is to merge the files to large file beforehand, or use the
> >> CombineFileInputFormat as your InputFormat.
> >>
> >>
> >>
> >>  On Mon, Jan 18, 2010 at 1:05 PM, Teryl Taylor <teryl.taylor@gmail.com>
> >> wrote:
> >>
> >> Hi everyone,
> >>
> >> I'm playing around with the Hadoop map reduce library and I'm getting
> some
> >> odd behaviour.  The system is setup on one machine using the pseudo
> >> distributed configuration.  I use KFS as my DFS.   I have written a
> >> MapReduce program to process a bunch of binary files.  The files are
> >> compressed in chunks, so I do not split the files.  There are 1083 files
> >> that I am loading into the map reducer.
> >>
> >> Everytime I run the map reducer:
> >>
> >> ~/hadoop/bin/hadoop jar  /home/hadoop/lib/test.jar
> >> org.apache.hadoop.mapreduce.apps.Test /input/2007/*/*/  /output
> >>
> >> It always creates 1083 mapper tasks to do the processing..which is
> >> extremely slow....so I wanted to try and lower the number to 10 and see
> how
> >> the performance is.  I set the following in  mapred-site.xml:
> >>
> >> <configuration>
> >>  <property>
> >>    <name>mapred.job.tracker</name>
> >>    <value>localhost:9001</value>
> >>  </property>
> >>  <property>
> >>   <name>mapred.tasktracker.map.tasks.maximum</name>
> >>   <value>10</value>
> >>  </property>
> >>  <property>
> >>   <name>mapred.tasktracker.reduce.tasks.maximum</name>
> >>   <value>10</value>
> >>  </property>
> >>  <property>
> >>    <name>mapred.map.tasks</name>
> >>    <value>10</value>
> >>  </property>
> >> </configuration>
> >>
> >> Have recycled the jobtracker and tasktracker and I still always get 1083
> >> mappers.   The map reducer is working as expected other than this.  I'm
> >> using the new API and my main function in my class looks like:
> >>
> >>  public static void main(String[] args) throws Exception {
> >>     Configuration conf = new Configuration();
> >>     String[] otherArgs = new GenericOptionsParser(conf,
> >> args).getRemainingArgs();
> >>     if (otherArgs.length != 2) {
> >>       System.err.println("Usage: wordcount <in> <out>");
> >>       System.exit(2);
> >>     }
> >> //    conf.setInt("mapred.map.tasks", 10);
> >>     Job job = new Job(conf, "word count");
> >>     conf.setNumMapTasks(10);
> >>     job.setJarByClass(Test.class);
> >>     job.setMapperClass(TestMapper.class);
> >>     job.setCombinerClass(IntSumReducer.class);
> >>     job.setReducerClass(IntSumReducer.class);
> >>     job.setOutputKeyClass(Text.class);
> >>     job.setOutputValueClass(IntWritable.class);
> >>     job.setInputFormatClass(CustomInputFormat.class);
> >>     job.setOutputFormatClass(TextOutputFormat.class);
> >>     FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
> >>     FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
> >>     System.exit(job.waitForCompletion(true) ? 0 : 1);
> >>   }
> >>
> >> I have been digging around in the hadoop source code and it looks like
> the
> >> JobClient actually sets the mappers to the number of splits (hard
> >> coded)....snippet from the JobClient class:
> >>
> >>
> *****************************************************************************
> >>
> *****************************************************************************
> >>
>
> *****************************************************************************>>
> *
> >> /**
> >>    * Internal method for submitting jobs to the system.
> >>    * @param job the configuration to submit
> >>    * @return a proxy object for the running job
> >>    * @throws FileNotFoundException
> >>    * @throws ClassNotFoundException
> >>    * @throws InterruptedException
> >>    * @throws IOException
> >>    */
> >>   public
> >>   RunningJob submitJobInternal(JobConf job
> >>                                ) throws FileNotFoundException,
> >>                                         ClassNotFoundException,
> >>                                         InterruptedException,
> >>                                         IOException {
> >>     /*
> >>      * configure the command line options correctly on the submitting
> dfs
> >>      */
> >>
> >>     JobID jobId = jobSubmitClient.getNewJobId();
> >>     Path submitJobDir = new Path(getSystemDir(), jobId.toString());
> >>     Path submitJarFile = new Path(submitJobDir, "job.jar");
> >>     Path submitSplitFile = new Path(submitJobDir, "job.split");
> >>     configureCommandLineOptions(job, submitJobDir, submitJarFile);
> >>     Path submitJobFile = new Path(submitJobDir, "job.xml");
> >>     int reduces = job.getNumReduceTasks();
> >>     JobContext context = new JobContext(job, jobId);
> >>
> >>     // Check the output specification
> >>     if (reduces == 0 ? job.getUseNewMapper() : job.getUseNewReducer()) {
> >>       org.apache.hadoop.mapreduce.OutputFormat<?,?> output =
> >>         ReflectionUtils.newInstance(context.getOutputFormatClass(),
> job);
> >>       output.checkOutputSpecs(context);
> >>     } else {
> >>       job.getOutputFormat().checkOutputSpecs(fs, job);
> >>     }
> >>
> >>     // Create the splits for the job
> >>     LOG.debug("Creating splits at " +
> fs.makeQualified(submitSplitFile));
> >>     int maps;
> >>     if (job.getUseNewMapper()) {
> >>       maps = writeNewSplits(context, submitSplitFile);
> >>     } else {
> >>       maps = writeOldSplits(job, submitSplitFile);
> >>     }
> >>     job.set("mapred.job.split.file", submitSplitFile.toString());
> >>     job.setNumMapTasks(maps);
> >>
> >>     // Write job file to JobTracker's fs
> >>     FSDataOutputStream out =
> >>       FileSystem.create(fs, submitJobFile,
> >>                         new FsPermission(JOB_FILE_PERMISSION));
> >>
> >>     try {
> >>       job.writeXml(out);
> >>     } finally {
> >>       out.close();
> >>    .....
> >>
> >> 737,0-1       39%
> >> }
> >>
> >>
> >>
> *****************************************************************************
> >>
> *****************************************************************************
> >>
> *****************************************************************************
> >> ********
> >>
> >> Is there anything I can do to get the number of mappers to be more
> >> flexible?
> >>
> >>
> >> Cheers,
> >>
> >> Teryl
> >>
> >>
> >>
> >>
> >> --
> >> Best Regards
> >>
> >> Jeff Zhang
> >>
>
>

Mime
View raw message