Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D48156425 for ; Wed, 18 May 2011 17:18:49 +0000 (UTC) Received: (qmail 97636 invoked by uid 500); 18 May 2011 17:18:49 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 97602 invoked by uid 500); 18 May 2011 17:18:49 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 97594 invoked by uid 99); 18 May 2011 17:18:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 May 2011 17:18:49 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [74.10.23.101] (HELO secure.telescope.tv) (74.10.23.101) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 May 2011 17:18:44 +0000 Received: from valkyrie.telescope.local ([192.168.31.27]) by valkyrie.telescope.local ([192.168.31.27]) with mapi; Wed, 18 May 2011 10:19:55 -0700 From: Aaron Baff To: "mapreduce-user@hadoop.apache.org" Date: Wed, 18 May 2011 10:18:20 -0700 Subject: RE: Running M/R jobs from java code Thread-Topic: Running M/R jobs from java code Thread-Index: AcwVd5+T2SZiLcY9SjCcU6cB1vQ5GQAB1tDw Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 It's not terribly hard to submit MR Job's. Create a hadoop Configuration ob= ject, and set it's fs.default.name and fs.defaultFS to the Namenode URI, an= d mapreduce.jobtracker.address and mapred.job.tracker to the JobTracker URI= . You can then easily setup and use a Job object (new API), or JobConf and = JobClient (old API I think) to create and submit a MR Job, and then also mo= nitor it's state and progress from within Java. You'll just need to make su= re any 3rd part libraries that you require are within the job Jar, or on HD= FS and you add it as part of the distributed caching mechanism to the MR Jo= b. W're doing this extensively, with a Java daemon using Thrift so our PHP UI = can talk to the daemon and start reports, monitor them, and then once they = are done retrieve the results. The daemon starts up all the MR Jobs in the = necessary order to complete a report. Works quite generally speaking, at le= ast for us. --Aaron -----Original Message----- From: Joey Echeverria [mailto:joey@cloudera.com] Sent: Wednesday, May 18, 2011 9:19 AM To: mapreduce-user@hadoop.apache.org Subject: Re: Running M/R jobs from java code Just last week I worked on a REST interface hosted in Tomcat that launched a MR job. In my case, I included the jar with the job in the WAR and called the run() method (the job implemented Tool). The only tricky part is a copy of the Hadoop configuration files needed to be in the classpath, but I just added those to the main Tomcat classpath. The Tomcat server was not on the same node as any other cluster machine, but there was no firewall between it and the cluster. Oh, I also had to create a tomcat6 user on the namenode/jobtracker and create a home directory in HDFS. I could have probably called set("user.name", "existing_user") in the configuration to avoid adding the tomcat6 user. -Joey On Wed, May 18, 2011 at 8:37 AM, Geoffry Roberts wrote: > I am confronted with the same problem. What I plan to do is to have a > servlet simply execute a command on the machine from where I would start = the > job if I were running it from the command line. > > e.g. > $ ssh '/bin/hadoop jar myjob.jar' > > Another possibility would be to rig some kind of RMI thing. > > Now here's an idea: Use an aglet. ;-) If I get into a funky mood I migh= t > just give this a try. > > On 18 May 2011 08:07, Lior Schachter wrote: >> >> Another machine in the cluster. >> >> On Wed, May 18, 2011 at 6:05 PM, Geoffry Roberts >> wrote: >>> >>> Is Tomcat installed on your hadoop name node? or another machine? >>> >>> On 18 May 2011 07:58, Lior Schachter wrote: >>>> >>>> Hi, >>>> I have my application installed on Tomcat and I wish to submit M/R job= s >>>> programmatically. >>>> Is there any standard way to do that ? >>>> >>>> Thanks, >>>> Lior >>> >>> >>> >>> -- >>> Geoffry Roberts >>> >> > > > > -- > Geoffry Roberts > > -- Joseph Echeverria Cloudera, Inc. 443.305.9434