hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergey Shelukhin <>
Subject Re: Wildcard and Automatic partitioning
Date Thu, 03 Aug 2017 19:38:04 GMT
How would Hive determine partition keys for partitioning from arbitrary directory structure?
There has to be some format, and there already is. Also columns for keys need to be added
to the table, with types.
For reading it without partitions, Hive already supports mapred.input.dir.recursive, which
would read all the nested directories. In fact iirc it’s on by default if Tez is used.

From: Nirav Patel <<>>
Reply-To: "<>" <<>>
Date: Thursday, August 3, 2017 at 11:41
To: "<>" <<>>
Subject: Re: Wildcard and Automatic partitioning

What is the point if I have to rename hdfs directories to hive format  (key=value/key=value
etc.).  Why it can't just be normal directory like everyone has. That entire directory can
be considered as a key ("period" in my example) and hive add all values as partitions.

On Thu, Aug 3, 2017 at 11:25 AM, Sergey Shelukhin <<>>
The typical, although not technically intended for the purpose, approach is to use msck to
“repair” the table and create the partitions. The partitions have to be in the standard
Hive format (key=value/key=value etc.) and the table must be created with the corresponding
partition keys.
It may actually be good to have a feature to do it in a standard manner for external tables
only, however it would probably be restricted to the same format. So, the example below probably
won’t work because of the star in the middle.

From: Nirav Patel <<>>
Reply-To: "<>" <<>>
Date: Thursday, August 3, 2017 at 11:00
To: "<>" <<>>
Subject: Wildcard and Automatic partitioning

Hi, is there a way in hive when I create an external table I can specify wild card in LOCATION
and have hive automatically identify partitions.

I have opened HIVE-17236<> for wildcard
support. same time I also have issue of specifying partitions.  I can use tedious ALTER TABLE
to add partition directory. But since data already exist in separate partition why can't hive
identify it ?

Here's such example of a directory:

I can have n number of customer and for each m number of partition for departments object.

I think if I use following sql to create external table then it should be able to identify
all the partitions.

CREATE EXTERNAL TABLE testTable (val map<string, string>)
PARTITIONED BY (period string)
LOCATION '/user/mycomp/customers/*/departments/partition/*';

If my partition directory (lets say p-12345) have multiple files insides it that doesn't start
with "part-" prefix then I should be able to specify that prefix so hive can find the right


[What's New with Xactly]<>

 [LinkedIn] <>   [Twitter] <>
  [Facebook] <>   [YouTube] <>

[What's New with Xactly]<>

 [LinkedIn] <>   [Twitter] <>
  [Facebook] <>   [YouTube] <>
View raw message