Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 58334 invoked from network); 14 Dec 2010 09:31:48 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 14 Dec 2010 09:31:48 -0000 Received: (qmail 52797 invoked by uid 500); 14 Dec 2010 09:31:46 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 52388 invoked by uid 500); 14 Dec 2010 09:31:45 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 52380 invoked by uid 99); 14 Dec 2010 09:31:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Dec 2010 09:31:44 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.215.43] (HELO mail-ew0-f43.google.com) (209.85.215.43) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Dec 2010 09:31:40 +0000 Received: by ewy22 with SMTP id 22so229808ewy.2 for ; Tue, 14 Dec 2010 01:31:18 -0800 (PST) Received: by 10.213.12.194 with SMTP id y2mr434274eby.25.1292319078086; Tue, 14 Dec 2010 01:31:18 -0800 (PST) MIME-Version: 1.0 Received: by 10.213.31.145 with HTTP; Tue, 14 Dec 2010 01:30:47 -0800 (PST) In-Reply-To: References: From: Alejandro Abdelnur Date: Tue, 14 Dec 2010 17:30:47 +0800 Message-ID: Subject: Re: how to run jobs every 30 minutes? To: common-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=0015174c3d1495fa5604975b7bf7 --0015174c3d1495fa5604975b7bf7 Content-Type: text/plain; charset=ISO-8859-1 Ed, Actually Oozie is quite different from Cascading. * Cascading allows you to write 'queries' using a Java API and they get translated into MR jobs. * Oozie allows you compose sequences of MR/Pig/Hive/Java/SSH jobs in a DAG (workflow jobs) and has timer+data dependency triggers (coordinator jobs). Regards. Alejandro On Tue, Dec 14, 2010 at 1:26 PM, edward choi wrote: > Thanks for the tip. I took a look at it. > Looks similar to Cascading I guess...? > Anyway thanks for the info!! > > Ed > > 2010/12/8 Alejandro Abdelnur > > > Or, if you want to do it in a reliable way you could use an Oozie > > coordinator job. > > > > On Wed, Dec 8, 2010 at 1:53 PM, edward choi wrote: > > > My mistake. Come to think about it, you are right, I can just make an > > > infinite loop inside the Hadoop application. > > > Thanks for the reply. > > > > > > 2010/12/7 Harsh J > > > > > >> Hi, > > >> > > >> On Tue, Dec 7, 2010 at 2:25 PM, edward choi wrote: > > >> > Hi, > > >> > > > >> > I'm planning to crawl a certain web site every 30 minutes. > > >> > How would I get it done in Hadoop? > > >> > > > >> > In pure Java, I used Thread.sleep() method, but I guess this won't > > work > > >> in > > >> > Hadoop. > > >> > > >> Why wouldn't it? You need to manage your post-job logic mostly, but > > >> sleep and resubmission should work just fine. > > >> > > >> > Or if it could work, could anyone show me an example? > > >> > > > >> > Ed. > > >> > > > >> > > >> > > >> > > >> -- > > >> Harsh J > > >> www.harshj.com > > >> > > > > > > --0015174c3d1495fa5604975b7bf7--