Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4005C97C3 for ; Thu, 29 Sep 2011 17:54:12 +0000 (UTC) Received: (qmail 14470 invoked by uid 500); 29 Sep 2011 17:54:09 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 14411 invoked by uid 500); 29 Sep 2011 17:54:09 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 14403 invoked by uid 99); 29 Sep 2011 17:54:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Sep 2011 17:54:09 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [74.10.23.101] (HELO secure.telescope.tv) (74.10.23.101) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Sep 2011 17:54:04 +0000 Received: from valkyrie.telescope.local ([192.168.31.27]) by valkyrie.telescope.local ([192.168.31.27]) with mapi; Thu, 29 Sep 2011 10:53:26 -0700 From: Aaron Baff To: "common-user@hadoop.apache.org" Date: Thu, 29 Sep 2011 10:53:25 -0700 Subject: RE: Running multiple MR Job's in sequence Thread-Topic: Running multiple MR Job's in sequence Thread-Index: Acx+0EhyxD/nEmtVRPeDTOiMEZuRpwAADu4Q Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Yea, we don't want it to sit there waiting for the Job to complete, even if= it's just a few minutes. --Aaron -----Original Message----- From: turbocodr@gmail.com [mailto:turbocodr@gmail.com] On Behalf Of John Co= nwell Sent: Thursday, September 29, 2011 10:50 AM To: common-user@hadoop.apache.org Subject: Re: Running multiple MR Job's in sequence After you kick off a job, say JobA, your client doesn't need to sit and pin= g Hadoop to see if it finished before it starts JobB. You can have the clien= t block until the job is complete with "Job.waitForCompletion(boolean verbose)". Using this you can create a "job driver" that chains jobs together easily. Now, if your job takes 2 weeks to run, you cant kill your driver process. If you do, JobA will finish running, but JobB will never start JohnC On Thu, Sep 29, 2011 at 9:51 AM, Aaron Baff wrote= : > I saw this, but wasn't sure if it was something that ran on the client an= d > just submitted the Job's in sequence, or if that gave it all to the > JobTracker, and the JobTracker took care of submitting the Jobs in sequen= ce > appropriately. > > Basically, I'm looking for a completely stateless client, that doesn't ne= ed > to ping the JobTracker every now and then to see if a Job has completed, = and > then submit the next one. The ideal flow would be the client gets in a > request to run the series of Jobs, it preps them all, gets them all > configured, and then passes them off to the JobTracker which runs them al= l > in order without the client application needing to do anthing further. > > Sounds like that doesn't really exist as part of Hadoop framework, and > needs something like Oozie (or a home-built system) to do this. > > --Aaron > -----Original Message----- > From: Harsh J [mailto:harsh@cloudera.com] > Sent: Wednesday, September 28, 2011 9:37 PM > To: common-user@hadoop.apache.org > Subject: Re: Running multiple MR Job's in sequence > > Within the Hadoop core project, there is JobControl you can utilize > for this. You can view its API at > > http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred= /jobcontrol/package-summary.html > and it is fairly simple to use (Create jobs in regular java API, build > a dependency flow using JobControl atop these jobconf objects). > > Apache Oozie and other such tools offer higher abstractions on > controlling a workflow, and can be considered when your needs can get > a bit complex than just a series (easy to handle failure scenarios > between dependent jobs, perform minor fs operations in pre/post > processing, etc.). > > On Thu, Sep 29, 2011 at 5:26 AM, Aaron Baff > wrote: > > Is it possible to submit a series of MR Jobs to the JobTracker to run i= n > sequence (one finishes, take the output of that if successful and feed it > into the next, etc), or does it need to run client side by using the > JobControl or something like Oozie, or rolling our own? What I'm looking = for > is a fire & forget, and occasionally check back to see if it's done. So > client-side doesn't need to really know anything or keep track of anythin= g. > Does something like that exist within the Hadoop framework? > > > > --Aaron > > > > > > -- > Harsh J > -- Thanks, John C