hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Zimmerman <>
Subject Re: Hive Orchestration
Date Mon, 20 Apr 2009 01:08:40 GMT

On Apr 19, 2009, at 1:47 AM, Jonathan Warden wrote:

> I'm looking for a framework that manages automatic initiation of our  
> daily data loading and processing, with knowledge of dependencies  
> between tables and "data ready" status flags.
> I think some people call this "Orchestration" (though there's not a  
> settled definitions of this word).
> I get the impression there are a lot of home grown solutions for  
> this.  But I'd like a generalized solution that would allow me to  
> just create a config file containing:
>  - A list of my tables
>  - What tables each table depends on
>  - Queries for loading one day of data into each table (for external  
> "raw data" tables, say a program to fetch this from wherever we  
> fetch it from)
> Then there'd be a driver process that would automatically run  
> everything every day based on my config, and would maintain a status  
> (that I could query in report generation and monitoring processes)  
> on what data was loaded successfully into a given table for a given  
> day.
> There's Apache HOD (Hadoop on Demand), but it's just integration  
> with batch schedulers.  Then there's apache ODE (Orchestration  
> Director Engine), but this seems to be Web Services Orchestration  
> and I don't see it as solving my problem (though I'm not sure).
> Any ideas?

View raw message