hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Ferguson <>
Subject Re: External tables and existing directory structure
Date Fri, 28 Nov 2008 23:31:55 GMT
I think this is a pretty common scenario as this is how I was storing  
my stuff as well. Would this affect the HiveQL create table statement  
at all or just implicitly require that it be ordered?


On Nov 28, 2008, at 3:00 PM, Joydeep Sen Sarma wrote:

> Hi Johann,
> Create external table with the 'location' clause set to ur data  
> would be the way to go. However - Hive has it's own directory naming  
> scheme for partitions ('<partitition_key>=<partition_val>'). So just  
> pointing to a directory with subdirectories would not work.
> So right now case one would have to move or copy the data using the  
> load command.
> Going forward - one thing we can do is that for external tables - we  
> can drop the 'key=val' directory naming for partitioned stuff and  
> just assume that directory hierarchy follows the partition list and  
> the directory names are partition values. Is that's what's required  
> in this case?
> Joydeep
> -----Original Message-----
> From: Johan Oskarsson []
> Sent: Friday, November 28, 2008 3:49 AM
> To:
> Subject: External tables and existing directory structure
> Hi, just had some fun with hive. Exciting stuff.
> I have one question about mapping tables to our existing directory
> structure. I assume the "CREATE EXTERNAL TABLE" would be the way to  
> go,
> but I haven't been able to find much information about how it works.
> We have datasets in the following format in hdfs:
> /dataset/yyyy/MM/dd/<one or more files>
> I'd love to be able to bind these with the date as the partition to  
> hive
> tables without copying or moving the data. Is it currently possible?
> /Johan

View raw message