hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Stewart <robstewar...@googlemail.com>
Subject Loading data directories into Hive DB
Date Thu, 14 Jan 2010 23:54:14 GMT
Hi there,

Briefly, I'm doing a comparative study on Pig, Hive, JavaMR and JAQL.

I have used a useful tool developed by the fella's in the Pig dev team to
generate my test data.

It allows me to specify the number of mappers to run (it is itself a
MapReduce job). I am left with a folder e.g. for 2 mappers.

InputData/JoinApp/part-00000
InputData/JoinApp/part-00001


Now, for Pig, JAQL and Java MR, all I need to specify is InputData/JoinApp
as the "input" for the scripts, and it understands that it is a partitioned
dataset. e.g. for Pig:
myinput = LOAD 'InputData/JoinApp' USING TextLoader();

and for JAQL:
$input = read(lines("InputData/JoinApp"));

However, with Hive if I try:
CREATE EXTERNAL TABLE Text(words STRING)
LOCATION 'InputData/JoinApp/HiveTable';
hadoop dfs -cp InputData/JoinApp InputData/JoinApp/HiveTable

I get the following error:
cp: Target InputData/JoinApp is a directory

I'm clearly missing something. How can I create a script to allow to import
partitioned datasets as above into a Hive table?

Thanks,

Rob Stewart

Mime
View raw message