hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Geoffry Roberts <geoffry.robe...@gmail.com>
Subject Re: Running M/R jobs from java code
Date Wed, 18 May 2011 17:23:55 GMT
Aaron,

I didn't know one could do this thanks. I'll give it a try.

On 18 May 2011 10:18, Aaron Baff <Aaron.Baff@telescope.tv> wrote:

> It's not terribly hard to submit MR Job's. Create a hadoop Configuration
> object, and set it's fs.default.name and fs.defaultFS to the Namenode URI,
> and mapreduce.jobtracker.address and mapred.job.tracker to the JobTracker
> URI. You can then easily setup and use a Job object (new API), or JobConf
> and JobClient (old API I think) to create and submit a MR Job, and then also
> monitor it's state and progress from within Java. You'll just need to make
> sure any 3rd part libraries that you require are within the job Jar, or on
> HDFS and you add it as part of the distributed caching mechanism to the MR
> Job.
>
> W're doing this extensively, with a Java daemon using Thrift so our PHP UI
> can talk to the daemon and start reports, monitor them, and then once they
> are done retrieve the results. The daemon starts up all the MR Jobs in the
> necessary order to complete a report. Works quite generally speaking, at
> least for us.
>
> --Aaron
>
> -----Original Message-----
> From: Joey Echeverria [mailto:joey@cloudera.com]
> Sent: Wednesday, May 18, 2011 9:19 AM
> To: mapreduce-user@hadoop.apache.org
> Subject: Re: Running M/R jobs from java code
>
> Just last week I worked on a REST interface hosted in Tomcat that
> launched a MR job. In my case, I included the jar with the job in the
> WAR and called the run() method (the job implemented Tool). The only
> tricky part is a copy of the Hadoop configuration files needed to be
> in the classpath, but I just added those to the main Tomcat classpath.
>
> The Tomcat server was not on the same node as any other cluster
> machine, but there was no firewall between it and the cluster.
>
> Oh, I also had to create a tomcat6 user on the namenode/jobtracker and
> create a home directory in HDFS. I could have probably called
> set("user.name", "existing_user") in the configuration to avoid adding
> the tomcat6 user.
>
> -Joey
>
> On Wed, May 18, 2011 at 8:37 AM, Geoffry Roberts
> <geoffry.roberts@gmail.com> wrote:
> > I am confronted with the same problem.  What I plan to do is to have a
> > servlet simply execute a command on the machine from where I would start
> the
> > job if I were running it from the command line.
> >
> > e.g.
> > $ ssh <remote host> '<hadoop_home>/bin/hadoop jar myjob.jar'
> >
> > Another possibility would be to rig some kind of RMI thing.
> >
> > Now here's an idea:  Use an aglet. ;-)  If I get into a funky mood I
> might
> > just give this a try.
> >
> > On 18 May 2011 08:07, Lior Schachter <liors@infolinks.com> wrote:
> >>
> >> Another machine in the cluster.
> >>
> >> On Wed, May 18, 2011 at 6:05 PM, Geoffry Roberts
> >> <geoffry.roberts@gmail.com> wrote:
> >>>
> >>> Is Tomcat installed on your hadoop name node? or another machine?
> >>>
> >>> On 18 May 2011 07:58, Lior Schachter <liors@infolinks.com> wrote:
> >>>>
> >>>> Hi,
> >>>> I have my application installed on Tomcat and I wish to submit M/R
> jobs
> >>>> programmatically.
> >>>> Is there any standard way to do that ?
> >>>>
> >>>> Thanks,
> >>>> Lior
> >>>
> >>>
> >>>
> >>> --
> >>> Geoffry Roberts
> >>>
> >>
> >
> >
> >
> > --
> > Geoffry Roberts
> >
> >
>
>
>
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>



-- 
Geoffry Roberts

Mime
View raw message