hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Brown <>
Subject Best practice for automating jobs
Date Thu, 10 Jan 2013 22:03:56 GMT

I want to automate jobs against Hive (using an external table with
ever growing partitions), and I'm running into a few challenges:

Concurrency - If I run Hive as a thrift server, I can only safely run
one job at a time. As such, it seems like my best bet will be to run
it from the command line and setup a brand new instance for each job.
That quite a bit of a hassle to solves a seemingly common problem, so
I want to know if there are any accepted patterns or best practices
for this?

Partition management - New partitions will be added regularly. If I
have to setup multiple instances of Hive for each (potentially)
overlapping job, it will be difficult to keep track of the partitions
that have been added. In the context of the preceding question, what
is the best way to add metadata about new partitions?

Thanks in advance!


View raw message