hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Thorson <ethor...@varickmm.com>
Subject Partition by directory
Date Mon, 10 Dec 2012 21:46:18 GMT
Hello All,

I have been using the AWS setup for EMR for some time now and I am currently in the process
of implementing spark/shark on my own cluster. I am installing from https://github.com/downloads/mesos/spark/spark-0.6.0-sources.tar.gz.
Which includes hive0.9.0. I am using this with s3 and am unable to recover partitions from
a directory with a series of other directories (partitions)  inside of it. I want to have
2 partitions 2012-10-25 and 2012-10-26 which contain their respective files. For example I
have the following files located at s3://varickTest3/nn/.


drwxrwxrwx   -          0 1970-01-01 00:00 /nn/ds=2012-10-25

-rwxrwxrwx   1   49696432 2012-12-10 20:55 /nn/ds=2012-10-25/part-00000

-rwxrwxrwx   1   49696432 2012-12-10 20:55 /nn/ds=2012-10-25/part-00001

drwxrwxrwx   -          0 1970-01-01 00:00 /nn/ds=2012-10-26

-rwxrwxrwx   1   49696432 2012-12-10 20:55 /nn/ds=2012-10-26/part-00000

-rwxrwxrwx   1   49696432 2012-12-10 20:55 /nn/ds=2012-10-26/part-00001


When I run the query in hive (not shark):


CREATE EXTERNAL TABLE wiki(id BIGINT, title STRING, last_modified STRING, xml STRING, text
STRING)

PARTITIONED BY (ds STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION 's3n://varickTest3/nn';

ALTER TABLE wiki RECOVER PARTITIONS;


This will result in an empty table.


I have tried many iterations of this and nothing has worked so far. Including adding:

MSCK REPAIR TABLE wiki;

And using s3 rather than s3n (credentials for both types are set in core-site.xml)


And setting the options:

SET hive.exec.dynamic.partition=true;

SET hive.exec.dynamic.partition.mode=nonstrict;


Although if I use:

LOCATION 's3n://varickTest3/nn/*


The table will have content but I am still unable to recover partitions.


Is there any way to do this using settings or data structure (rather than writing a script)
to partition the table using the directories as I can in AWS?


Thank you for any help anyone can give me.

Mime
View raw message