hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <praveen.pe...@nokia.com>
Subject RE: Starting a Hadoop job programtically
Date Tue, 23 Nov 2010 23:10:45 GMT
Hi Henning,
Putting core-site.xml in classpath worked. Thanks for the help. I need to figure how to submit
a job as a different user than the user hadoop is configured for.

I have one more related to job submission. Did anyone face problem with running job that involves
multiple jar files. I am running a map reduce job that references multiple jar files. When
I run the job I always get ClassNotFoundException on the class that is not in the jar file
that job class is present.

I am starting the jobs from a java application and am getting ClassNotFoundException.

java.lang.RuntimeException: java.lang.ClassNotFoundException: com.nokia.relevancy.util.hadoop.ValueOnlyTextOutputFormat
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
        at org.apache.hadoop.mapreduce.JobContext.getOutputFormatClass(JobContext.java:193)
        at org.apache.hadoop.mapred.Task.initialize(Task.java:413)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:288)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException: com.nokia.relevancy.util.hadoop.ValueOnlyTextOutputFormat
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:247)
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:762)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:807)
        ... 4 more

Praveen

________________________________
From: ext Henning Blohm [mailto:henning.blohm@zfabrik.de]
Sent: Tuesday, November 23, 2010 11:37 AM
To: mapreduce-user@hadoop.apache.org
Subject: RE: Starting a Hadoop job programtically

Hi Praveen,

On Tue, 2010-11-23 at 17:18 +0100, praveen.peddi@nokia.com wrote:
Hi Henning,
adding hadoop's conf folder didn't help fixing the issue but when I added the two below properties,
I was able to access file system but cannot write anything due to different user. I have following
questions based on experiments.

Exaclty. I didn't mean to add the whole folder. Just the one file with those props.

1. How can I access HDFS or submit jobs as different user than my java app is running. For
example, Hadoop cluster is setup for "hadoop" user and my java app is runnign as different
user. In order to run the job correctly, I have to submit it as "hadoop" user. correct? How
to achive it programitcally?

We always run everything with the same user (now that you mention it). Didn't know that we
would have a problem otherwise. I would have suspected that the submitting user doesn't matter
(setting the corresponding system property would probably override that one anyway).

2. Few of the jobs I am calling is provided by the library which means I cannot add these
two config properties myself. Is there any way around this other than replicating the job
submission code from the library to locally?

Yes, I think creating a core-site.xml file as below, putting it into <folder> (any folder
you like will do) and adding <folder> to your classpath when submitting should do the
trick (as I tried to explain before and if I am not mistaken).

Thanks
Praveen

Good luck,
  Henning


________________________________

From: ext Henning Blohm [mailto:henning.blohm@zfabrik.de]
Sent: Tuesday, November 23, 2010 3:24 AM
To: mapreduce-user@hadoop.apache.org
Subject: RE: Starting a Hadoop job programtically



Hi Praveen,

  in order to submit it to the cluster, you just need to have a core-site.xml on your classpath
(or load it explicitly into your configuration object) that looks (at least) like this

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://${name:port of namenode}</value>
</property>

<property>
<name>mapred.job.tracker</name>
<value>${name:port of jobtracker}</value>
</property>
</configuration>

If you want to wait for each job's completion, you can use job.waitForCompletion(true) rather
than job.submit().

Good luck,
  henning


On Mon, 2010-11-22 at 23:40 +0100, praveen.peddi@nokia.com wrote:
Hi Thanks for your reply. In my case I have a Driver that calls multiple jobs one after the
other. I am using the following code to submit each job but it uses local hadoop jar files
that is in the classpath. Its not submitting the job to Hadoop cluster. I thought I would
need to specify where the master Hadoop is located on remote machine. Example command I use
from command line is as follows but I need to do it from my Java program.
$ hadoop-0.20.2/bin/hadoop jar /home/ppeddi/dev/Merchandising/RelevancyEngine/relevancy-core/dist/Relevancy4.jar
-i raw-downloads-input-10K -o reco-patterns-output-10K-1S -k 100 -method mapreduce -g 500
-regex '[\ ]' -s 5


I hope I made the question clear now.
Praveen

________________________________


From: ext Henning Blohm [mailto:henning.blohm@zfabrik.de]
Sent: Monday, November 22, 2010 5:07 PM
To: mapreduce-user@hadoop.apache.org
Subject: Re: Starting a Hadoop job programtically



Hi Praveen,

  we do. We are using the "new" org.apache.hadoop.mapreduce.* API in Hadoop 0.20.2.

  Essentially the flow is:

  //----
  // assuming all config is on the class path
  Configuration config = new Configuration();
  Job job = new Job(config, "some job name");

  // set in/out types
  job.setInputFormatClass(...);
  job.setOutputFormatClass(...);
  job.setMapOutputKeyClass(...);
  job.setMapOutputValueClass(...);
  job.setOutputKeyClass(...);
  job.setOutputValueClass(...);

  // set implementations as required
  job.setMapperClass(<your mapper implementation class object>);
  job.setCombinerClass(<your combiner implementation class object>);
  job.setReducerClass(<your reducer implementation class object>);

  // set the jar... this is often the tricky part!
  job.setJarByClass(<some class that is in the job jar and not elsewhere higher up on the
class path>);

  job.submit();
  //----

Hope I didn't forget anything.

Note: You need to give Hadoop something it can launch in a JVM that has no more but the hadoop
jars and whatever else you
configured statically in your hadoop-env.sh script.

Can you describe your scenario in more detail?

Henning


Am Montag, den 22.11.2010, 22:39 +0100 schrieb praveen.peddi@nokia.com:
Hi all,
I am trying to figure how I can start a hadoop job porgramatically from my Java application
running in an app server. I was able to run my map reduce job using hadoop command from hadoop
master machine but my goal is to run the same job from my java program (running on a different
machine than master). I googled and could not find solution for this. All the examples I have
seen so far are using hadoop from command line to start a job.
1. Has anyone called Hadoop job invocation from a Java application?
2. If so, could someone provide some sample code.
3.
Thanks
Praveen

Henning Blohm

ZFabrik Software KG

henning.blohm@zfabrik.de<mailto:henning.blohm@zfabrik.de>
www.z2-environment.eu<http://www.z2-environment.eu>









Mime
View raw message