hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henning Blohm <henning.bl...@zfabrik.de>
Subject RE: Starting a Hadoop job programtically
Date Tue, 23 Nov 2010 08:23:47 GMT
Hi Praveen,

  in order to submit it to the cluster, you just need to have a
core-site.xml on your classpath (or load it explicitly into your
configuration object) that looks (at least) like this

<configuration>
	<property>
		<name>fs.default.name</name>
		<value>hdfs://${name:port of namenode}</value>
	</property>

	<property>
		<name>mapred.job.tracker</name>
		<value>${name:port of jobtracker}</value>
	</property> 
</configuration>

If you want to wait for each job's completion, you can use
job.waitForCompletion(true) rather than job.submit().

Good luck,
  henning


On Mon, 2010-11-22 at 23:40 +0100, praveen.peddi@nokia.com wrote:
> Hi Thanks for your reply. In my case I have a Driver that calls
> multiple jobs one after the other. I am using the following code to
> submit each job but it uses local hadoop jar files that is in the
> classpath. Its not submitting the job to Hadoop cluster. I thought I
> would need to specify where the master Hadoop is located on remote
> machine. Example command I use from command line is as follows but I
> need to do it from my Java program.
>  
> $ hadoop-0.20.2/bin/hadoop
> jar /home/ppeddi/dev/Merchandising/RelevancyEngine/relevancy-core/dist/Relevancy4.jar
-i raw-downloads-input-10K -o reco-patterns-output-10K-1S -k 100 -method mapreduce -g 500
-regex '[\ ]' -s 5
> 
> 
> I hope I made the question clear now.
>  
> Praveen
> 
> 
> ______________________________________________________________________
> From: ext Henning Blohm [mailto:henning.blohm@zfabrik.de] 
> Sent: Monday, November 22, 2010 5:07 PM
> To: mapreduce-user@hadoop.apache.org
> Subject: Re: Starting a Hadoop job programtically
> 
> 
> 
> 
> Hi Praveen,
> 
>   we do. We are using the "new" org.apache.hadoop.mapreduce.* API in
> Hadoop 0.20.2.
> 
>   Essentially the flow is:
> 
>   //----
>   // assuming all config is on the class path
>   Configuration config = new Configuration(); 
>   Job job = new Job(config, "some job name");
> 
>   // set in/out types
>   job.setInputFormatClass(...);
>   job.setOutputFormatClass(...);
>   job.setMapOutputKeyClass(...);
>   job.setMapOutputValueClass(...);
>   job.setOutputKeyClass(...);
>   job.setOutputValueClass(...);
> 
>   // set implementations as required
>   job.setMapperClass(<your mapper implementation class object>);
>   job.setCombinerClass(<your combiner implementation class object>);
>   job.setReducerClass(<your reducer implementation class object>);
> 
>   // set the jar... this is often the tricky part!
>   job.setJarByClass(<some class that is in the job jar and not
> elsewhere higher up on the class path>);
> 
>   job.submit();
>   //----
> 
> Hope I didn't forget anything.  
> 
> Note: You need to give Hadoop something it can launch in a JVM that
> has no more but the hadoop jars and whatever else you
> configured statically in your hadoop-env.sh script.
> 
> Can you describe your scenario in more detail?
> 
> Henning
> 
> 
> Am Montag, den 22.11.2010, 22:39 +0100 schrieb
> praveen.peddi@nokia.com: 
> 
> > Hi all, 
> > I am trying to figure how I can start a hadoop job porgramatically
> > from my Java application running in an app server. I was able to run
> > my map reduce job using hadoop command from hadoop master machine
> > but my goal is to run the same job from my java program (running on
> > a different machine than master). I googled and could not find
> > solution for this. All the examples I have seen so far are using
> > hadoop from command line to start a job. 
> > 1. Has anyone called Hadoop job invocation from a Java application? 
> > 2. If so, could someone provide some sample code. 
> > 3. 
> > Thanks 
> > Praveen 
> > 
> 
> Henning Blohm
> 
> ZFabrik Software KG
> 
> henning.blohm@zfabrik.de
> www.z2-environment.eu
> 
> 
> 



Mime
View raw message