hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paco NATHAN" <cet...@gmail.com>
Subject Re: Can jobs be configured to be sequential
Date Sat, 18 Oct 2008 01:46:39 GMT
Hi Ravion,

The problem you are describing sounds like a workflow where you must
be careful to verify certain conditions before proceeding to a next
step.

We have similar kinds of use cases for Hadoop apps at work, which are
essentially ETL.  I recommend that you look at http://cascading.org as
an abstraction layer for managing these kinds of workflows. We've
found it quite useful.

Best,
Paco


On Fri, Oct 17, 2008 at 8:29 PM, Ravion <ravishankar.nair@gmail.com> wrote:
> Dear all,
>
> We have in our Data Warehouse System, about 600  ETL( Extract Transform Load) jobs to
create interim data model. SOme jobs are dependent on completion of others.
>
> Assume that I create a group id intdependent jobs. Say a group G1 contains 100 jobs ,
G2 contains another 200 jobs which are dependent on completion of Group G1 and so on.
>
> Can we leverage on Haddop so that Hadoop executed G1 first, on failure it wont execute
G2 otherwise will continue for G2 and so  on.. ?
>
> Or do I need to configure "N" ( where N =  total number of groups) Haddop jobs independently
and handle by ourselves?
>
> Please share your thoughts, thanks
>
> Warmest regards,
> Ravion

Mime
View raw message