Return-Path: Delivered-To: apmail-hadoop-hive-user-archive@minotaur.apache.org Received: (qmail 59729 invoked from network); 15 Jan 2010 00:15:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 15 Jan 2010 00:15:43 -0000 Received: (qmail 32354 invoked by uid 500); 15 Jan 2010 00:15:43 -0000 Delivered-To: apmail-hadoop-hive-user-archive@hadoop.apache.org Received: (qmail 32280 invoked by uid 500); 15 Jan 2010 00:15:42 -0000 Mailing-List: contact hive-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hive-user@hadoop.apache.org Delivered-To: mailing list hive-user@hadoop.apache.org Received: (qmail 32268 invoked by uid 99); 15 Jan 2010 00:15:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Jan 2010 00:15:42 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [192.139.80.206] (HELO mx1.casalemedia.com) (192.139.80.206) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Jan 2010 00:15:36 +0000 Received: from exchange.casalemedia.com (unknown [10.3.10.15]) by mx1.casalemedia.com (Postfix) with ESMTP id 64904588011 for ; Thu, 14 Jan 2010 19:15:15 -0500 (EST) Received: from mayuran.casalemedia.com (10.3.10.40) by exchange.casalemedia.com (10.3.10.15) with Microsoft SMTP Server id 8.1.393.1; Thu, 14 Jan 2010 19:15:15 -0500 Message-ID: <4B4FB392.205@casalemedia.com> Date: Thu, 14 Jan 2010 19:15:14 -0500 From: Mayuran Yogarajah User-Agent: Thunderbird 2.0.0.23 (X11/20090812) MIME-Version: 1.0 To: "hive-user@hadoop.apache.org" Subject: Re: Loading data directories into Hive DB References: In-Reply-To: Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Hi, > > I'm clearly missing something. How can I create a script to allow to > import partitioned datasets as above into a Hive table? So you want to load part-00000 and part-00001 as two separate partitions? If so, you'll need to have those files in separate subdirs. I put them into subdirs like this, so you can partition it by mapper. /user/hadoop/InputData/JoinApp/mapper1/part-00000 /user/hadoop/InputData/JoinApp/mapper2/part-00001 Then in Hive: create external table test ( words string ) partitioned by (mapper string) stored as textfile location '/user/hadoop; Then add the partitions: alter table test add partition (mapper='mapper1') location '/user/hadoop/InputData/JoinApp/mapper1'; alter table test add partition (mapper='mapper2') location '/user/hadoop/InputData/JoinApp/mapper2'; Sanity check: hive> show partitions test; OK mapper=mapper1 mapper=mapper2 hive> select *from test; OK bar mapper1 foo mapper2 Hope this helps.. M