Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 902EA10F84 for ; Thu, 29 Aug 2013 08:36:54 +0000 (UTC) Received: (qmail 83557 invoked by uid 500); 29 Aug 2013 08:36:53 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 83448 invoked by uid 500); 29 Aug 2013 08:36:53 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 83422 invoked by uid 500); 29 Aug 2013 08:36:52 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 83419 invoked by uid 99); 29 Aug 2013 08:36:52 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Aug 2013 08:36:52 +0000 Date: Thu, 29 Aug 2013 08:36:52 +0000 (UTC) From: "Carl Steinbach (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-951) Selectively include EXTERNAL TABLE source files via REGEX MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-951: -------------------------------- Assignee: (was: Carl Steinbach) > Selectively include EXTERNAL TABLE source files via REGEX > --------------------------------------------------------- > > Key: HIVE-951 > URL: https://issues.apache.org/jira/browse/HIVE-951 > Project: Hive > Issue Type: Improvement > Components: Query Processor > Reporter: Carl Steinbach > Attachments: HIVE-951.patch > > > CREATE EXTERNAL TABLE should allow users to cherry-pick files via regular expression. > CREATE EXTERNAL TABLE was designed to allow users to access data that exists outside of Hive, and > currently makes the assumption that all of the files located under the supplied path should be included > in the new table. Users frequently encounter directories containing multiple > datasets, or directories that contain data in heterogeneous schemas, and it's often > impractical or impossible to adjust the layout of the directory to meet the requirements of > CREATE EXTERNAL TABLE. A good example of this problem is creating an external table based > on the contents of an S3 bucket. > One way to solve this problem is to extend the syntax of CREATE EXTERNAL TABLE > as follows: > CREATE EXTERNAL TABLE > ... > LOCATION path [file_regex] > ... > For example: > {code:sql} > CREATE EXTERNAL TABLE mytable1 ( a string, b string, c string ) > STORED AS TEXTFILE > LOCATION 's3://my.bucket/' 'folder/2009.*\.bz2$'; > {code} > Creates mytable1 which includes all files in s3:/my.bucket with a filename matching 'folder/2009*.bz2' > {code:sql} > CREATE EXTERNAL TABLE mytable2 ( d string, e int, f int, g int ) > STORED AS TEXTFILE > LOCATION 'hdfs://data/' 'xyz.*2009????.bz2$'; > {code} > Creates mytable2 including all files matching 'xyz*2009????.bz2' located under hdfs://data/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira