hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sadananda Hegde <>
Subject Re: Automating the partition creation process
Date Wed, 30 Jan 2013 01:09:08 GMT
Thanks Mark,

Recover partition feature will satisfy my needs; but MSCK Repair Partition
< tablename> option is not working for me. It does not give any error; but
does not add any partitions either.  It looks like it adds partitions only
when the sub-folder is empty; but not when the sub-folder has the data
files. I see a fix to this issue here.

But probably it's not commited yet, since the final result says 'ABORTED".


On Mon, Jan 28, 2013 at 10:47 PM, Mark Grover

> Sadananda,
> See if this helps:
> On Mon, Jan 28, 2013 at 8:05 PM, Sadananda Hegde <>wrote:
>> Hello,
>> My hive table is partitioned by year, month and day. I have defined it as
>> external table. The M/R job correctly loads the files into the daily
>> subfolders. The hdfs files will be loaded to
>> <hivetable>/year=yyyy/month=mm/day=dd/ folders by the scheduled M/R jobs.
>> The M/R job has some business logic in determining the values for year,
>> month and day; so one run might create / load files into multiple sub
>> -folders (multiple days). I am able to query the tables after adding
>> partitions using ALTER TABLE ADD PARTITION statement. But how do I automate
>> the partition creation step? Basically this script needs to identify the
>> subfolders created by the M/R job and create corresponding ALTER TABLE ADD
>> PARTITION statements.
>> For example, say the M/R job loads files into the following 3 sub-folders
>> /user/hive/warehouse/sales/year=2013/month=1/day=21
>> /user/hive/warehouse/sales/year=2013/month=1/day=22
>> /user/hive/warehouse/sales/year=2013/month=1/day=23
>> Then it should create 3 alter table statements
>> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=21);
>>  ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=22);
>> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=23);
>> I thought of changing M/R jobs to load all files into same folder,
>> then first load the files into non-partitioned table and then to load the
>> partitioned table from non-partitioned table (using dynamic partition); but
>> would prefer to avoid that extra step if possible (esp. since data is
>> already in the correct sub-folders).
>> Any help would greately be appreciated.
>> Regards,
>> Sadu

View raw message