Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: <AANLkTimSTLiGON6+gKC0nLOHdNQgQWZLk768V=C+2rLL@mail.gmail.com>
References: <AANLkTikvpWENKj_pGNOPf=MMKe+XJTBBjhhusB=UWBzO@mail.gmail.com>
 <AANLkTik-h7Se2oFb3hfQyYjXs8GQp9pTvu_5Rqwpfc38@mail.gmail.com>
 <AANLkTimk6+Vy_qQBh9d8A3J2Gqz6d3KV9b7tCm+S44Dv@mail.gmail.com>
 <AANLkTinMSzH6Dyise6oftB1NbMM_q+1j0wQABdCGFRGW@mail.gmail.com>
 <AANLkTimSTLiGON6+gKC0nLOHdNQgQWZLk768V=C+2rLL@mail.gmail.com>
From: Alejandro Abdelnur <tucu@cloudera.com>
Date: Tue, 14 Dec 2010 17:30:47 +0800
Message-ID: <AANLkTi=nePY8T6nwaMh7pta94OH3xV44iz0ULRuA56gs@mail.gmail.com>
Subject: Re: how to run jobs every 30 minutes?
To: common-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=0015174c3d1495fa5604975b7bf7

--0015174c3d1495fa5604975b7bf7
Content-Type: text/plain; charset=ISO-8859-1

Ed,

Actually Oozie is quite different from Cascading.

* Cascading allows you to write 'queries' using a Java API and they get
translated into MR jobs.
* Oozie allows you compose sequences of MR/Pig/Hive/Java/SSH jobs in a DAG
(workflow jobs) and has timer+data dependency triggers (coordinator jobs).

Regards.

Alejandro

On Tue, Dec 14, 2010 at 1:26 PM, edward choi <mp2893@gmail.com> wrote:

> Thanks for the tip. I took a look at it.
> Looks similar to Cascading I guess...?
> Anyway thanks for the info!!
>
> Ed
>
> 2010/12/8 Alejandro Abdelnur <tucu@cloudera.com>
>
> > Or, if you want to do it in a reliable way you could use an Oozie
> > coordinator job.
> >
> > On Wed, Dec 8, 2010 at 1:53 PM, edward choi <mp2893@gmail.com> wrote:
> > > My mistake. Come to think about it, you are right, I can just make an
> > > infinite loop inside the Hadoop application.
> > > Thanks for the reply.
> > >
> > > 2010/12/7 Harsh J <qwertymaniac@gmail.com>
> > >
> > >> Hi,
> > >>
> > >> On Tue, Dec 7, 2010 at 2:25 PM, edward choi <mp2893@gmail.com> wrote:
> > >> > Hi,
> > >> >
> > >> > I'm planning to crawl a certain web site every 30 minutes.
> > >> > How would I get it done in Hadoop?
> > >> >
> > >> > In pure Java, I used Thread.sleep() method, but I guess this won't
> > work
> > >> in
> > >> > Hadoop.
> > >>
> > >> Why wouldn't it? You need to manage your post-job logic mostly, but
> > >> sleep and resubmission should work just fine.
> > >>
> > >> > Or if it could work, could anyone show me an example?
> > >> >
> > >> > Ed.
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Harsh J
> > >> www.harshj.com
> > >>
> > >
> >
>

--0015174c3d1495fa5604975b7bf7--