hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravi Shetye <ravi.she...@vizury.com>
Subject Hive on EMR on S3 : Beginner
Date Fri, 24 Aug 2012 12:26:20 GMT
I have the data in s3 bucket in the following manner

|s3://logs/ad1date1.log.gz
s3://logs/ad1date2.log.gz
s3://logs/ad1date3.log.gz
s3://logs/ad1date4.log.gz
s3://logs/ad2date1.log.gz
s3://logs/ad2date2.log.gz
s3://logs/ad2date3.log.gz
s3://logs/ad2date4.log.gz
|

I have to load some of them into a single hive table for which I am 
using the following query

|CREATE EXTERNAL TABLE analyze_files_tab (cookie STRING,
d2 STRING,
url STRING,
d4 STRING,
d5 STRING,
d6 STRING,
adv_id_dummy STRING,
timestp STRING,
ip STRING,
userAgent STRING,
stage STRING,
d12 STRING,
d13 STRING)
PARTITION BY (adv_id,date) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3://logs/joined_analyze_files_hive/'
|

How should I add the data into files?

will |ALTER TABLE raw_logs RECOVER PARTITIONS;| do the trick?

Don't I need to map which file maps to which adv_id,date combination?

Also a pointer to good tutorial for beginner would be helpful.



Mime
View raw message