hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Ruchovets <oruchov...@gmail.com>
Subject Re: execute hadoop job from remote web application
Date Tue, 18 Oct 2011 16:50:38 GMT
So you mean that in case I am going to submit job remotely and
my_hadoop_job.jar
will be in class path of my web application it will submit job with
my_hadoop_job.jar to
remote hadoop machine (cluster)?

On Tue, Oct 18, 2011 at 6:13 PM, Harsh J <harsh@cloudera.com> wrote:

> Oleg,
>
> Steve already covered this.
>
> The "hadoop jar" subcommand merely runs the jar program for you, as a
> utility - it has nothing to do with submissions really.
>
> Have you tried submitting your program by running your jar as a
> regular java program (java -jar <jar>), with the proper classpath?
> (You may use "hadoop classpath" to get a string.).
>
> It would go through fine, and submit the job jar with classes
> included, over to the JobTracker.
>
> On Tue, Oct 18, 2011 at 9:13 PM, Oleg Ruchovets <oruchovets@gmail.com>
> wrote:
> > I  try to be more specific. It is not dependent jar. It is a jar which
> > contains map/reduce/combine classes and some business logic.
> >  executing our job from command line, class which parse parameters and
> > submit a job has a line of code:
> >    job.setJarByClass(HadoopJobExecutor.class);
> >
> > we execute it locally on hadoop master machine using command such
> command:
> > opt/hadoop/bin/hadoop jar /opt/hadoop/hadoop-jobs/my_hadoop_job.jar
> > -inputPath /opt/inputs/  -outputPath /data/output_jobs/output
> >
> > and of course my_hadoop_job.jar  is found because it is located on the
> same
> > machine.
> >
> > Now , suppose I am going to submit job remotely (from web applications).
> >  and I have the same line of code
> > job.setJarByClass(HadoopJobExecutor.class);
> >
> >  In case my_hadoop_job.jar located on remote hadoop machine  (in class
> path)
> > , my jobClient will failed because there is no job jar in class path ( it
> is
> > located on remote hadoop machine). Am I write? I simply don't know how to
> > submit a job remotely (in my case job is not a map/combine/reduce classes
> it
> > is a jar which contains other classes too).
> >
> > Regarding remotely invoke the shellscript that contains the hadoop jar
> > command with
> > any required input arguments.
> >    It is possible to do it  by Runtime.getRuntime().exec(
> > submitCommand.toString().split( " " ) );
> > But I prefer to use jobClient , because I can monitor my job (get
> counters
> > and other useful information).
> >
> > Thanks in advance
> > Oleg.
> >
> > On Tue, Oct 18, 2011 at 4:34 PM, Bejoy KS <bejoy.hadoop@gmail.com>
> wrote:
> >
> >> Hi Oleg
> >>          I haven't tried out a scenario like you mentioned. But I think
> >> there shouldn't be any issue in submitting a job that has some dependent
> >> classes which holds the business logic referred from mapper,reducer or
> >> combiner. You should be able to do the job submission remotely the same
> we
> >> were discussing in this thread. If you need to distribute any dependent
> >> jars/files along with the application jar, you can use the -libjars
> option
> >> in CLI or use the DistributedCache methods like
> >> addArchiveToClassPath()/addFileToClassPath() in your java code. If it is
> a
> >> dependent jar It is better to deploy the same in the cluster environment
> >> itself so that every time when you submit your job you don't have to
> >> transfer the jar over the network again and again.
> >>         Just a suggestion, if you can execute the job from within your
> >> hadoop cluster you don't have to do a remote job submission. You just
> need
> >> to remotely invoke the shellscript that contains the hadoop jar command
> >> with
> >> any required input arguments. Sorry if I'm not getting your requirement
> >> exactly.
> >>
> >> Regards
> >> Bejoy.K.S
> >>
> >> On Tue, Oct 18, 2011 at 6:29 PM, Oleg Ruchovets <oruchovets@gmail.com
> >> >wrote:
> >>
> >> > Thanks  you all for your answers but I still have a questions:
> >> >  Currently we running our jobs using shell scripts which locates on
> >> hadoop
> >> > master machine.
> >> >
> >> > Here is an example of command line:
> >> > /opt/hadoop/bin/hadoop jar /opt/hadoop/hadoop-jobs/my_hadoop_job.jar
> >> > -inputPath /opt/inputs/  -outputPath /data/output_jobs/output
> >> >
> >> > my_hadoop_job.jar has a class which parse input parameters and submit
> a
> >> > job.
> >> > Our code is very similar like you wrote:
> >> >   ......
> >> >
> >> >        job.setJarByClass(HadoopJobExecutor.class);
> >> >        job.setMapperClass(MultipleOutputMap.class);
> >> >        job.setCombinerClass(BaseCombine.class);
> >> >        job.setReducerClass(HBaseReducer.class);
> >> >        job.setOutputKeyClass(Text.class);
> >> >        job.setOutputValueClass(MapWritable.class);
> >> >
> >> >        FileOutputFormat.setOutputPath(job, new Path(finalOutPutPath));
> >> >
> >> >        jobCompleteStatus = job.waitForCompletion(true);
> >> > ...............
> >> >
> >> > my question are:
> >> >
> >> > 1) my_hadoop_job.jar contains another classes (business logic) not
> only
> >> > Map,Combine,Reduce classes and I still don't understand how can I
> submit
> >> > job
> >> > which needs all classes from my_hadoop_job.jar?
> >> > 2) Do I need to submit a my_hadoop_job.jar too? If yes what is the way
> to
> >> > do
> >> > it?
> >> >
> >> > Thanks In Advance
> >> > Oleg.
> >> >
> >> > On Tue, Oct 18, 2011 at 2:11 PM, Uma Maheswara Rao G 72686 <
> >> > maheswara@huawei.com> wrote:
> >> >
> >> > > ----- Original Message -----
> >> > > From: Bejoy KS <bejoy.hadoop@gmail.com>
> >> > > Date: Tuesday, October 18, 2011 5:25 pm
> >> > > Subject: Re: execute hadoop job from remote web application
> >> > > To: common-user@hadoop.apache.org
> >> > >
> >> > > > Oleg
> >> > > >      If you are looking at how to submit your jobs using
> >> > > > JobClient then the
> >> > > > below sample can give you a start.
> >> > > >
> >> > > > //get the configuration parameters and assigns a job name
> >> > > >        JobConf conf = new JobConf(getConf(), MyClass.class);
> >> > > >        conf.setJobName("SMS Reports");
> >> > > >
> >> > > >        //setting key value types for mapper and reducer outputs
> >> > > >        conf.setOutputKeyClass(Text.class);
> >> > > >        conf.setOutputValueClass(Text.class);
> >> > > >
> >> > > >        //specifying the custom reducer class
> >> > > >        conf.setReducerClass(SmsReducer.class);
> >> > > >
> >> > > >        //Specifying the input directories(@ runtime) and Mappers
> >> > > > independently for inputs from multiple sources
> >> > > >        FileInputFormat.addInputPath(conf, new Path(args[0]));
> >> > > >
> >> > > >        //Specifying the output directory @ runtime
> >> > > >        FileOutputFormat.setOutputPath(conf, new Path(args[1]));
> >> > > >
> >> > > >        JobClient.runJob(conf);
> >> > > >
> >> > > > Along with the hadoop jars you may need to have the config files
> >> > > > as well on
> >> > > > your client.
> >> > > >
> >> > > > The sample is from old map reduce API. You can use the new one
as
> >> > > > well in
> >> > > > that we use the Job instead of JobClient.
> >> > > >
> >> > > > Hope it helps!..
> >> > > >
> >> > > > Regards
> >> > > > Bejoy.K.S
> >> > > >
> >> > > >
> >> > > > On Tue, Oct 18, 2011 at 5:00 PM, Oleg Ruchovets
> >> > > > <oruchovets@gmail.com>wrote:
> >> > > > > Excellent. Can you give a small example of code.
> >> > > > >
> >> > > Good samle by Bejoy
> >> > > hope, you have access for this site.
> >> > > Also please go through this docs,
> >> > >
> >> > >
> >> >
> >>
> http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Example%3A+WordCount+v2.0
> >> > > Here is the wordcount example.
> >> > >
> >> > > > >
> >> > > > > On Tue, Oct 18, 2011 at 1:13 PM, Uma Maheswara Rao G 72686
<
> >> > > > > maheswara@huawei.com> wrote:
> >> > > > >
> >> > > > > >
> >> > > > > > ----- Original Message -----
> >> > > > > > From: Oleg Ruchovets <oruchovets@gmail.com>
> >> > > > > > Date: Tuesday, October 18, 2011 4:11 pm
> >> > > > > > Subject: execute hadoop job from remote web application
> >> > > > > > To: common-user@hadoop.apache.org
> >> > > > > >
> >> > > > > > > Hi , what is the way to execute hadoop job on
remote
> >> > > > cluster. I
> >> > > > > > > want to
> >> > > > > > > execute my hadoop job from remote web  application
, but I
> >> > > > didn't> > > find any
> >> > > > > > > hadoop client (remote API) to do it.
> >> > > > > > >
> >> > > > > > > Please advice.
> >> > > > > > > Oleg
> >> > > > > > >
> >> > > > > > You can put the Hadoop jars in your web applications
classpath
> >> > > > and find
> >> > > > > the
> >> > > > > > Class JobClient and submit the jobs using it.
> >> > > > > >
> >> > > > > > Regards,
> >> > > > > > Uma
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > > Regards
> >> > > Uma
> >> > >
> >> >
> >>
> >
>
>
>
> --
> Harsh J
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message