hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <>
Subject Re: Hive Orchestration
Date Sun, 19 Apr 2009 14:32:21 GMT
On Sun, Apr 19, 2009 at 4:47 AM, Jonathan Warden <> wrote:
> I'm looking for a framework that manages automatic initiation of our daily
> data loading and processing, with knowledge of dependencies between tables
> and "data ready" status flags.
> I think some people call this "Orchestration" (though there's not a settled
> definitions of this word).
> I get the impression there are a lot of home grown solutions for this.  But
> I'd like a generalized solution that would allow me to just create a config
> file containing:
>  - A list of my tables
>  - What tables each table depends on
>  - Queries for loading one day of data into each table (for external "raw
> data" tables, say a program to fetch this from wherever we fetch it from)
> Then there'd be a driver process that would automatically run everything
> every day based on my config, and would maintain a status (that I could
> query in report generation and monitoring processes) on what data was loaded
> successfully into a given table for a given day.
> There's Apache HOD (Hadoop on Demand), but it's just integration with batch
> schedulers.  Then there's apache ODE (Orchestration Director Engine), but
> this seems to be Web Services Orchestration and I don't see it as solving my
> problem (though I'm not sure).
> Any ideas?

It sounds good. What do you think the overlap with zookeeper is?

The "entry" points for Hive seem to be the 'HiveServer' and 'Hive -e'
I have used the Hive API directly. Do you have plans for supporting
those three things?

View raw message