drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Phillips (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-3381) Add option to distribute partition keys in CTAS
Date Fri, 26 Jun 2015 00:37:04 GMT
Steven Phillips created DRILL-3381:

             Summary: Add option to distribute partition keys in CTAS
                 Key: DRILL-3381
                 URL: https://issues.apache.org/jira/browse/DRILL-3381
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Steven Phillips

The current implementation does not redistribute, which would tend to result in a lot of extra
files. Specifically, the number of files will be larger by a factor equal to the number of
fragments in the final stage of the query. On even a moderately sized cluster, this number
could easily be in the thousands, so a table with a 100 different partitions would end up
with hundreds of thousands of files.

To allow a workaround for this situation, we should add an option to include an extra distribution,
so that all the rows for any given partition are written from the same writer.

This message was sent by Atlassian JIRA

View raw message