hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean McNamara <Sean.McNam...@Webtrends.com>
Subject Re: Best practice for automating jobs
Date Thu, 10 Jan 2013 22:11:58 GMT
> I want to know if there are any accepted patterns or best practices for
>this?

http://oozie.apache.org/



> New partitions will be added regularly

What type of partitions are you adding? Why frequently?




Sean


On 1/10/13 3:03 PM, "Tom Brown" <tombrown52@gmail.com> wrote:

>All,
>
>I want to automate jobs against Hive (using an external table with
>ever growing partitions), and I'm running into a few challenges:
>
>Concurrency - If I run Hive as a thrift server, I can only safely run
>one job at a time. As such, it seems like my best bet will be to run
>it from the command line and setup a brand new instance for each job.
>That quite a bit of a hassle to solves a seemingly common problem, so
>I want to know if there are any accepted patterns or best practices
>for this?
>
>Partition management - New partitions will be added regularly. If I
>have to setup multiple instances of Hive for each (potentially)
>overlapping job, it will be difficult to keep track of the partitions
>that have been added. In the context of the preceding question, what
>is the best way to add metadata about new partitions?
>
>Thanks in advance!
>
>--Tom


Mime
View raw message