Mailing-List: contact hive-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hive-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: local policy)
Message-ID: <4B4FB392.205@casalemedia.com>
Date: Thu, 14 Jan 2010 19:15:14 -0500
From: Mayuran Yogarajah <mayuran.yogarajah@casalemedia.com>
User-Agent: Thunderbird 2.0.0.23 (X11/20090812)
MIME-Version: 1.0
To: "hive-user@hadoop.apache.org" <hive-user@hadoop.apache.org>
Subject: Re: Loading data directories into Hive DB
References: <d1e795761001141554u1fd0aa2fx3e846b77c268917e@mail.gmail.com>
In-Reply-To: <d1e795761001141554u1fd0aa2fx3e846b77c268917e@mail.gmail.com>
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
Content-Transfer-Encoding: 7bit

Hi,
>
> I'm clearly missing something. How can I create a script to allow to 
> import partitioned datasets as above into a Hive table?
So you want to load part-00000 and part-00001 as two separate partitions?
If so, you'll need to have those files in separate subdirs.

I put them into subdirs like this, so you can partition it by mapper.
/user/hadoop/InputData/JoinApp/mapper1/part-00000
/user/hadoop/InputData/JoinApp/mapper2/part-00001

Then in Hive:
create external table test (
words string )
partitioned by (mapper string)
stored as textfile location '/user/hadoop;

Then add the partitions:
alter table test add partition (mapper='mapper1') location 
'/user/hadoop/InputData/JoinApp/mapper1';
alter table test add partition (mapper='mapper2') location 
'/user/hadoop/InputData/JoinApp/mapper2';

Sanity check:
hive> show partitions test;
OK
mapper=mapper1
mapper=mapper2

hive> select *from test;
OK
bar     mapper1
foo     mapper2

Hope this helps..

M