Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DEFADE3CC for ; Wed, 30 Jan 2013 01:50:08 +0000 (UTC) Received: (qmail 91494 invoked by uid 500); 30 Jan 2013 01:50:07 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 91446 invoked by uid 500); 30 Jan 2013 01:50:07 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 91431 invoked by uid 99); 30 Jan 2013 01:50:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Jan 2013 01:50:07 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of saduhegde@gmail.com designates 209.85.217.169 as permitted sender) Received: from [209.85.217.169] (HELO mail-lb0-f169.google.com) (209.85.217.169) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Jan 2013 01:50:00 +0000 Received: by mail-lb0-f169.google.com with SMTP id m4so1543604lbo.28 for ; Tue, 29 Jan 2013 17:49:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=azlfuJ92xzhCd2imLzPXfy/EiqpxrrdvZBfWR+jyYeA=; b=VRc2VqD5GxzeBvztBL3JAeY3LrpI+/W3/HS6g5HsivhUjZ2bqduZDi3LYJmWRnZSsf fgWAYptoEHcwsg3tkg6MqCPCp77vj1CBc7KVAhKxWe4G3zS4RoDZtfXqirE6Jd9cGSWi agap+THgnkKkS0JGYsmpYNPmiFbBa1uQFzSxNNIgrsGh+7Q9pPHjlB0XgdfUSgB2/EYX +OwtkemZVLJ/x4mBFyV9+DdKAhJh9PHGVnMxJ1Syv3ffYjmJRTk0E/aNc3hqNvVRUlkH Pl7B/L3/cHtR670+RBWT6zok9+ZJTTFT1+zEqmHy2RGnz9uLHCPz5p6VY0D2FjvgtdF4 mAVA== MIME-Version: 1.0 X-Received: by 10.152.144.130 with SMTP id sm2mr2922488lab.49.1359510580036; Tue, 29 Jan 2013 17:49:40 -0800 (PST) Received: by 10.112.117.138 with HTTP; Tue, 29 Jan 2013 17:49:39 -0800 (PST) In-Reply-To: References: Date: Tue, 29 Jan 2013 19:49:39 -0600 Message-ID: Subject: Re: Automating the partition creation process From: Sadananda Hegde To: user@hive.apache.org Content-Type: multipart/alternative; boundary=e89a8f2348a931033804d477b91f X-Virus-Checked: Checked by ClamAV on apache.org --e89a8f2348a931033804d477b91f Content-Type: text/plain; charset=ISO-8859-1 Thanks, Edward. I can probably create all previous days partitions ahead of time and then use Dean's logic to create new partitions on a daily basis. I will probably end up having few empty partitions; need to make sure it does not cause any confusions. Thanks, Sadu On Tue, Jan 29, 2013 at 7:21 PM, Edward Capriolo wrote: > You can also just create all your partitions ahead of time. They will not > do any harm if empty. (unless you have an older version and hit this... > http://issues.apache.org/jira/browse/HIVE-1007 ) > > > On Tue, Jan 29, 2013 at 8:17 PM, Mark Grover wrote: > >> Hi Sadananda, >> Sorry to hear that. >> >> It got committed, don't worry about the "ABORTED". Here is the commit on >> the trunk: >> >> https://github.com/apache/hive/commit/523f47c3b6e7cb7b6b7b7801c66406e116af6dbc >> >> However, there is no Apache Hive release with that patch in it. >> >> You have two options: >> 1. Download the patch, rebuild hive and use it >> 2. Find a hacky way to recover your partitions when they are empty and >> populate them later. >> >> Sorry for the inconvenience. >> >> Mark >> >> On Tue, Jan 29, 2013 at 5:09 PM, Sadananda Hegde wrote: >> >>> Thanks Mark, >>> >>> Recover partition feature will satisfy my needs; but MSCK Repair >>> Partition < tablename> option is not working for me. It does not give any >>> error; but does not add any partitions either. It looks like it adds >>> partitions only when the sub-folder is empty; but not >>> when the sub-folder has the data files. I see a fix to this issue here. >>> >>> https://issues.apache.org/jira/browse/HIVE-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel >>> >>> But probably it's not commited yet, since the final result says >>> 'ABORTED". >>> >>> Thanks, >>> Sadu >>> >>> On Mon, Jan 28, 2013 at 10:47 PM, Mark Grover < >>> grover.markgrover@gmail.com> wrote: >>> >>>> Sadananda, >>>> See if this helps: >>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Recoverpartitions >>>> >>>> >>>> On Mon, Jan 28, 2013 at 8:05 PM, Sadananda Hegde wrote: >>>> >>>>> Hello, >>>>> >>>>> My hive table is partitioned by year, month and day. I have defined it >>>>> as external table. The M/R job correctly loads the files into the daily >>>>> subfolders. The hdfs files will be loaded to >>>>> /year=yyyy/month=mm/day=dd/ folders by the scheduled M/R jobs. >>>>> The M/R job has some business logic in determining the values for year, >>>>> month and day; so one run might create / load files into multiple sub >>>>> -folders (multiple days). I am able to query the tables after adding >>>>> partitions using ALTER TABLE ADD PARTITION statement. But how do I automate >>>>> the partition creation step? Basically this script needs to identify the >>>>> subfolders created by the M/R job and create corresponding ALTER TABLE ADD >>>>> PARTITION statements. >>>>> >>>>> For example, say the M/R job loads files into the following 3 >>>>> sub-folders >>>>> >>>>> /user/hive/warehouse/sales/year=2013/month=1/day=21 >>>>> /user/hive/warehouse/sales/year=2013/month=1/day=22 >>>>> /user/hive/warehouse/sales/year=2013/month=1/day=23 >>>>> >>>>> Then it should create 3 alter table statements >>>>> >>>>> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=21); >>>>> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=22); >>>>> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=23); >>>>> >>>>> I thought of changing M/R jobs to load all files into same folder, >>>>> then first load the files into non-partitioned table and then to load the >>>>> partitioned table from non-partitioned table (using dynamic partition); but >>>>> would prefer to avoid that extra step if possible (esp. since data is >>>>> already in the correct sub-folders). >>>>> >>>>> Any help would greately be appreciated. >>>>> >>>>> Regards, >>>>> Sadu >>>>> >>>>> >>>>> >>>> >>>> >>> >> > --e89a8f2348a931033804d477b91f Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Thanks, Edward.=A0 I can probably create all previous days partitions = ahead of time and then use Dean's logic to create new partitions on a d= aily basis. I will probably end=A0up having few empty partitions; need to m= ake sure it does not cause any confusions.
=A0
Thanks,
Sadu

On Tue, Jan 29, 2013 at 7:21 PM, Edward Capriolo &= lt;edlinuxguru@g= mail.com> wrote:
You can also just create all your partitions ahead of time= . They will not do any harm if empty. (unless you have an older version and= hit this... http://issues.apache.org/jira/browse/HIVE-1007 )


On Tue, Jan 29, 2013 at 8:17 PM, Mark Grover= <grover.markgrover@gmail.com> wrote:
Hi Sadananda,
Sorry to hear that.

It got commi= tted, don't worry about the "ABORTED". Here is the commit on = the trunk:

However, there is no Apache Hive release with that patc= h in it.

You have two options:
1. Downlo= ad the patch, rebuild hive and use it
2. Find a hacky way to reco= ver your partitions when they are empty and populate them later.

Sorry for the inconvenience.

M= ark

On Tue, Jan 29, 2013 at 5:09 PM, Sada= nanda Hegde <saduhegde@gmail.com> wrote:
Thanks=A0Mark,
=A0
Recover partit= ion feature will satisfy my needs; but MSCK Repair Partition < tablename= > option is not working for me. It does not give any error; but does not= add any partitions either.=A0 It looks like it adds partitions only when t= he sub-folder is empty; but not when=A0the=A0sub-folder=A0has the data file= s. I see a fix to this issue here.
=A0
But probably it's not commited yet, since the final = result says 'ABORTED".
=A0
Thanks,
= Sadu
=A0
On Mon, Jan 28, 201= 3 at 10:47 PM, Mark Grover <grover.markgrover@gmail.com><= /span> wrote:
Sadananda,
See if this helps:=A0https://cwiki.apache.o= rg/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Recoverpart= itions


On Mon, Jan 28, 2013 at 8:05 PM, Sadananda H= egde <saduhegde@gmail.com> wrote:
Hello,
=A0
My hive table=A0is partitioned by year,= month and day. I have defined=A0it as external table. The M/R job correctl= y loads the files into the daily subfolders.=A0The hdfs files will be loade= d to <hivetable>/year=3Dyyyy/month=3Dmm/day=3Ddd/ folders by the sche= duled M/R jobs. The M/R job has some business logic in determining the valu= es for year, month and day; so one run might=A0create / load files into=A0m= ultiple sub -folders (multiple days). I am able to query the tables after a= dding partitions using ALTER TABLE ADD PARTITION statement. But how do I au= tomate the partition creation step? Basically this script needs to identify= the subfolders created by the M/R job and create corresponding ALTER TABLE= ADD PARTITION statements.
=A0
For example, say the M/R job loads files into the follow= ing 3 sub-folders
=A0
/user/hive/warehouse/sales/year= =3D2013/month=3D1/day=3D21
/user/hive/warehouse/sales/year=3D2013= /month=3D1/day=3D22
/user/hive/warehouse/sales/year=3D2013/month=3D1/day=3D23
= =A0
Then it should create 3 alter table statements
=A0<= /div>
ALTER TABLE=A0sales ADD PARTITION (year=3D2013, month=3D1, day=3D= 21);
ALTER TABLE sales ADD PARTITION (year=3D2013, month=3D1, day=3D= 22);
ALTER TABLE sales ADD PARTITION (year=3D201= 3, month=3D1, day=3D23);
=A0
I thought of changing M/R = jobs to load=A0all files into same folder, then=A0first load the files into= non-partitioned table and then=A0to load the partitioned table from non-pa= rtitioned table (using dynamic partition); but would prefer to avoid that e= xtra step if possible (esp. since data is already in the correct sub-folder= s).=A0
=A0
Any help would greately be appreciated.
=A0
Regards,
Sadu
=A0
=A0





--e89a8f2348a931033804d477b91f--