hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Baff <Aaron.B...@telescope.tv>
Subject RE: Running M/R jobs from java code
Date Wed, 18 May 2011 17:18:20 GMT
It's not terribly hard to submit MR Job's. Create a hadoop Configuration object, and set it's
fs.default.name and fs.defaultFS to the Namenode URI, and mapreduce.jobtracker.address and
mapred.job.tracker to the JobTracker URI. You can then easily setup and use a Job object (new
API), or JobConf and JobClient (old API I think) to create and submit a MR Job, and then also
monitor it's state and progress from within Java. You'll just need to make sure any 3rd part
libraries that you require are within the job Jar, or on HDFS and you add it as part of the
distributed caching mechanism to the MR Job.

W're doing this extensively, with a Java daemon using Thrift so our PHP UI can talk to the
daemon and start reports, monitor them, and then once they are done retrieve the results.
The daemon starts up all the MR Jobs in the necessary order to complete a report. Works quite
generally speaking, at least for us.


-----Original Message-----
From: Joey Echeverria [mailto:joey@cloudera.com]
Sent: Wednesday, May 18, 2011 9:19 AM
To: mapreduce-user@hadoop.apache.org
Subject: Re: Running M/R jobs from java code

Just last week I worked on a REST interface hosted in Tomcat that
launched a MR job. In my case, I included the jar with the job in the
WAR and called the run() method (the job implemented Tool). The only
tricky part is a copy of the Hadoop configuration files needed to be
in the classpath, but I just added those to the main Tomcat classpath.

The Tomcat server was not on the same node as any other cluster
machine, but there was no firewall between it and the cluster.

Oh, I also had to create a tomcat6 user on the namenode/jobtracker and
create a home directory in HDFS. I could have probably called
set("user.name", "existing_user") in the configuration to avoid adding
the tomcat6 user.


On Wed, May 18, 2011 at 8:37 AM, Geoffry Roberts
<geoffry.roberts@gmail.com> wrote:
> I am confronted with the same problem.  What I plan to do is to have a
> servlet simply execute a command on the machine from where I would start the
> job if I were running it from the command line.
> e.g.
> $ ssh <remote host> '<hadoop_home>/bin/hadoop jar myjob.jar'
> Another possibility would be to rig some kind of RMI thing.
> Now here's an idea:  Use an aglet. ;-)  If I get into a funky mood I might
> just give this a try.
> On 18 May 2011 08:07, Lior Schachter <liors@infolinks.com> wrote:
>> Another machine in the cluster.
>> On Wed, May 18, 2011 at 6:05 PM, Geoffry Roberts
>> <geoffry.roberts@gmail.com> wrote:
>>> Is Tomcat installed on your hadoop name node? or another machine?
>>> On 18 May 2011 07:58, Lior Schachter <liors@infolinks.com> wrote:
>>>> Hi,
>>>> I have my application installed on Tomcat and I wish to submit M/R jobs
>>>> programmatically.
>>>> Is there any standard way to do that ?
>>>> Thanks,
>>>> Lior
>>> --
>>> Geoffry Roberts
> --
> Geoffry Roberts

Joseph Echeverria
Cloudera, Inc.

View raw message