hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mandus Momberg (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-951) Selectively include EXTERNAL TABLE source files via REGEX
Date Thu, 15 Jan 2015 05:23:37 GMT

    [ https://issues.apache.org/jira/browse/HIVE-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278274#comment-14278274
] 

Mandus Momberg commented on HIVE-951:
-------------------------------------

This patch is no longer working with the Latest version of Hive. 
Has there since been some kind of change in the way that Hive works that will allows us to
do this? 

> Selectively include EXTERNAL TABLE source files via REGEX
> ---------------------------------------------------------
>
>                 Key: HIVE-951
>                 URL: https://issues.apache.org/jira/browse/HIVE-951
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Carl Steinbach
>         Attachments: HIVE-951.patch
>
>
> CREATE EXTERNAL TABLE should allow users to cherry-pick files via regular expression.

> CREATE EXTERNAL TABLE was designed to allow users to access data that exists outside
of Hive, and
> currently makes the assumption that all of the files located under the supplied path
should be included
> in the new table. Users frequently encounter directories containing multiple
> datasets, or directories that contain data in heterogeneous schemas, and it's often
> impractical or impossible to adjust the layout of the directory to meet the requirements
of 
> CREATE EXTERNAL TABLE. A good example of this problem is creating an external table based
> on the contents of an S3 bucket. 
> One way to solve this problem is to extend the syntax of CREATE EXTERNAL TABLE
> as follows:
> CREATE EXTERNAL TABLE
> ...
> LOCATION path [file_regex]
> ...
> For example:
> {code:sql}
> CREATE EXTERNAL TABLE mytable1 ( a string, b string, c string )
> STORED AS TEXTFILE
> LOCATION 's3://my.bucket/' 'folder/2009.*\.bz2$';
> {code}
> Creates mytable1 which includes all files in s3:/my.bucket with a filename matching 'folder/2009*.bz2'
> {code:sql}
> CREATE EXTERNAL TABLE mytable2 ( d string, e int, f int, g int )
> STORED AS TEXTFILE 
> LOCATION 'hdfs://data/' 'xyz.*2009????.bz2$';
> {code}
> Creates mytable2 including all files matching 'xyz*2009????.bz2' located under hdfs://data/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message