hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alejandro Abdelnur <t...@cloudera.com>
Subject Re: how to run jobs every 30 minutes?
Date Tue, 14 Dec 2010 09:30:47 GMT
Ed,

Actually Oozie is quite different from Cascading.

* Cascading allows you to write 'queries' using a Java API and they get
translated into MR jobs.
* Oozie allows you compose sequences of MR/Pig/Hive/Java/SSH jobs in a DAG
(workflow jobs) and has timer+data dependency triggers (coordinator jobs).

Regards.

Alejandro

On Tue, Dec 14, 2010 at 1:26 PM, edward choi <mp2893@gmail.com> wrote:

> Thanks for the tip. I took a look at it.
> Looks similar to Cascading I guess...?
> Anyway thanks for the info!!
>
> Ed
>
> 2010/12/8 Alejandro Abdelnur <tucu@cloudera.com>
>
> > Or, if you want to do it in a reliable way you could use an Oozie
> > coordinator job.
> >
> > On Wed, Dec 8, 2010 at 1:53 PM, edward choi <mp2893@gmail.com> wrote:
> > > My mistake. Come to think about it, you are right, I can just make an
> > > infinite loop inside the Hadoop application.
> > > Thanks for the reply.
> > >
> > > 2010/12/7 Harsh J <qwertymaniac@gmail.com>
> > >
> > >> Hi,
> > >>
> > >> On Tue, Dec 7, 2010 at 2:25 PM, edward choi <mp2893@gmail.com> wrote:
> > >> > Hi,
> > >> >
> > >> > I'm planning to crawl a certain web site every 30 minutes.
> > >> > How would I get it done in Hadoop?
> > >> >
> > >> > In pure Java, I used Thread.sleep() method, but I guess this won't
> > work
> > >> in
> > >> > Hadoop.
> > >>
> > >> Why wouldn't it? You need to manage your post-job logic mostly, but
> > >> sleep and resubmission should work just fine.
> > >>
> > >> > Or if it could work, could anyone show me an example?
> > >> >
> > >> > Ed.
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Harsh J
> > >> www.harshj.com
> > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message