hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Hoffman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-6589) Automatically add partitions for external tables
Date Sat, 08 Mar 2014 04:07:42 GMT

    [ https://issues.apache.org/jira/browse/HIVE-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924704#comment-13924704
] 

Steve Hoffman commented on HIVE-6589:
-------------------------------------

This is a great idea to have an alternate way to specify partitioning.  Having a cron job
create partitions over and over into the database when there is such a clear programmatic
pattern is just silly (which is how I deal with this today -- and it stinks).

I should only need to specify the root directory and some directory pattern mapping (as in
Ken's example above).

This is very important for external tables where hive isn't managing the data directories
and streaming data which is ALWAYS creating new partitions.

> Automatically add partitions for external tables
> ------------------------------------------------
>
>                 Key: HIVE-6589
>                 URL: https://issues.apache.org/jira/browse/HIVE-6589
>             Project: Hive
>          Issue Type: New Feature
>    Affects Versions: 0.10.0
>            Reporter: Ken Dallmeyer
>
> I have a data stream being loaded into Hadoop via Flume. It loads into a date partition
folder in HDFS.  The path looks like this:
> {code}/flume/my_data/YYYY/MM/DD/HH
> /flume/my_data/2014/03/02/01
> /flume/my_data/2014/03/02/02
> /flume/my_data/2014/03/02/03{code}
> On top of it I create an EXTERNAL hive table to do querying.  As of now, I have to manually
add partitions.  What I want is for EXTERNAL tables, Hive should "discover" those partitions.
 Additionally I would like to specify a partition pattern so that when I query Hive will know
to use the partition pattern to find the HDFS folder.
> So something like this:
> {code}CREATE EXTERNAL TABLE my_data (
>   col1 STRING,
>   col2 INT
> )
> PARTITIONED BY (
>   dt STRING,
>   hour STRING
> )
> LOCATION 
>   '/flume/mydata'
> TBLPROPERTIES (
>   'hive.partition.spec' = 'dt=$Y-$M-$D, hour=$H',
>   'hive.partition.spec.location' = '$Y/$M/$D/$H',
> );
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message