hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chandra Mohan, Ananda Vel Murugan" <Ananda.Muru...@honeywell.com>
Subject RE: Running map reduce programmatically is unusually slow
Date Tue, 05 Nov 2013 05:58:07 GMT
Hi,

Today morning, I noticed one more weird thing. When I run the map reduce job using this utility,
it does not show up in JobTracker web UI. Any one has any clue? Please help. Thanks.

Regards,
Anand.C

From: Chandra Mohan, Ananda Vel Murugan [mailto:Ananda.Murugan@honeywell.com]
Sent: Monday, November 04, 2013 7:32 PM
To: user@hadoop.apache.org
Subject: Running map reduce programmatically is unusually slow

Hi,

I have written a small utility to run map reduce job programmatically. My aim is to run my
map reduce job without using hadoop shell script. I am planning to call this utility from
another application.

Following is the code which runs the map reduce job. I have bundled this java class into a
jar (remotemr.jar ). I have the actual map reduce job bundled inside another jar (mapreduce.jar)

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.SequenceFileInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
import org.apache.hadoop.mapred.jobcontrol.Job;
import org.apache.hadoop.mapred.jobcontrol.JobControl;


public class RemoteMapreduce {

       public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException
{

               String inputPath = args[0];
              String outputPath = args[1];
              String specFilePath=args[2];
              Configuration config = new Configuration();
              config.addResource(new Path("/opt/hadoop-1.0.2/bin/core-site.xml"));
              config.addResource(new Path("/opt/hadoop-1.0.2/bin/hdfs-site.xml"));
              JobConf jobConf = new JobConf(config);
              jobConf.set("hadoop.tmp.dir ", "/tmp/hadoop-ananda/");
              jobConf.setJar("/home/ananda/mapreduce.jar");
              jobConf.setMapperClass(Myjob.MapClass.class);
              SequenceFileInputFormat.setInputPaths(jobConf, new Path(inputPath));
              TextOutputFormat.setOutputPath(jobConf, new Path(outputPath));
              jobConf.setMapOutputKeyClass(Text.class);
              jobConf.setMapOutputValueClass(Text.class);
              jobConf.setInputFormat(SequenceFileInputFormat.class);
              jobConf.setOutputFormat(TextOutputFormat.class);
              jobConf.setOutputKeyClass(Text.class);
              jobConf.setOutputValueClass(Text.class);
              jobConf.set("specPath", specFilePath);
              jobConf.setUser("ananda");
              Job job1 = new Job(jobConf);
              JobClient jc = new JobClient(jobConf);
              jc.submitJob(jobConf);
              /* JobControl ctrl = new JobControl("dar");
              ctrl.addJob(job1);
              ctrl.run();*/

              System.out.println("Job launched!");

       }
}


I am running it as follows

java -cp  <all hadoop jars needed for the job>:/home/ananda/mapreduce.jar:/home/Ananda/remotemr.jar
 RemoteMapreduce <inputpath> <outputpath> <specpath>

It runs without any error. But it takes longer time than what it takes when I run it using
hadoop shell script. One more thing is all the three input paths needs to be fully qualified
HDFS paths i.e. hdfs://<hostname>:<port>/<path>. If I give partial paths
as in hadoop shell script, I am getting input path not found errors. Am I doing anything wrong?
Please help. Thanks

Regards,
Anand.C

Mime
View raw message