hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carl Steinbach (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-951) Selectively include EXTERNAL TABLE source files via REGEX
Date Tue, 24 Nov 2009 07:14:39 GMT

    [ https://issues.apache.org/jira/browse/HIVE-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781803#action_12781803
] 

Carl Steinbach commented on HIVE-951:
-------------------------------------

If the files you are trying to access are in an S3 bucket, copying them to a new location
can be extremely inconvenient. I think most people in this position would gladly accept
a little extra complexity if it allowed them to access their data without first spending an
hour
or two staging it. I'm also not sure why you think this is complex. Are you concerned about

details of the implementation or additional demands that this places on the user?




> Selectively include EXTERNAL TABLE source files via REGEX
> ---------------------------------------------------------
>
>                 Key: HIVE-951
>                 URL: https://issues.apache.org/jira/browse/HIVE-951
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Carl Steinbach
>
> CREATE EXTERNAL TABLE should allow users to cherry-pick files via regular expression.

> CREATE EXTERNAL TABLE was designed to allow users to access data that exists outside
of Hive, and
> currently makes the assumption that all of the files located under the supplied path
should be included
> in the new table. Users frequently encounter directories containing multiple
> datasets, or directories that contain data in heterogeneous schemas, and it's often
> impractical or impossible to adjust the layout of the directory to meet the requirements
of 
> CREATE EXTERNAL TABLE. A good example of this problem is creating an external table based
> on the contents of an S3 bucket. 
> One way to solve this problem is to extend the syntax of CREATE EXTERNAL TABLE
> as follows:
> CREATE EXTERNAL TABLE
> ...
> LOCATION path [file_regex]
> ...
> For example:
> {code:sql}
> CREATE EXTERNAL TABLE mytable1 ( a string, b string, c string )
> STORED AS TEXTFILE
> LOCATION 's3://my.bucket/' 'folder/2009.*\.bz2$';
> {code}
> Creates mytable1 which includes all files in s3:/my.bucket with a filename matching 'folder/2009*.bz2'
> {code:sql}
> CREATE EXTERNAL TABLE mytable2 ( d string, e int, f int, g int )
> STORED AS TEXTFILE 
> LOCATION 'hdfs://data/' 'xyz.*2009????.bz2$';
> {code}
> Creates mytable2 including all files matching 'xyz*2009????.bz2' located under hdfs://data/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message