drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jacques Nadeau (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-2260) Add support for partitioning files by certain criteria when doing a CTAS
Date Mon, 23 Feb 2015 01:13:17 GMT

     [ https://issues.apache.org/jira/browse/DRILL-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jacques Nadeau updated DRILL-2260:
----------------------------------
      Component/s: Execution - Flow
    Fix Version/s: 0.9.0

> Add support for partitioning files by certain criteria when doing a CTAS
> ------------------------------------------------------------------------
>
>                 Key: DRILL-2260
>                 URL: https://issues.apache.org/jira/browse/DRILL-2260
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Flow
>            Reporter: Aman Sinha
>            Assignee: Jacques Nadeau
>             Fix For: 0.9.0
>
>
> Doing a CTAS where we create a large number of files (thousands) is becoming increasingly
common.  In order to do partition pruning, we need to organize the files into subdirectories
such that Drill can expose the directory names as 'dir0', 'dir1' etc. and perform pruning.
 Currently, the organization of these files into subdirectories is a manual process and can
be tedious. 
> We need to provide a mechanism to organize these output files into subdirectories without
manual intervention.  We could add a PARTITIONED BY <column> extension to the CTAS statement,
similar to what Hive does.  
> One question is: suppose we partition by the Month column, do we remove that column from
the output files ? (since the column is represented by the subdirectories).  
> Since this is a 'feature' that would span multiple components, I haven't categorized
it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message