airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-2254) Fix header output on RedshiftToS3Transfer
Date Mon, 16 Apr 2018 08:22:00 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16439111#comment-16439111
] 

ASF subversion and git services commented on AIRFLOW-2254:
----------------------------------------------------------

Commit a148043107f147ce7d3617308f119be27810ec5a in incubator-airflow's branch refs/heads/master
from [~sathyaprakashg]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=a148043 ]

[AIRFLOW-2254] Put header as first row in unload

Currently, data is ordered by first column in
descending order
Header row comes as first only if the first column
is integer
This fix puts header as first row regardless of
first column data type

Closes #3180 from sathyaprakashg/AIRFLOW-2254


> Fix header output on RedshiftToS3Transfer
> -----------------------------------------
>
>                 Key: AIRFLOW-2254
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2254
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: aws, redshift
>            Reporter: Kengo Seki
>            Assignee: Sathyaprakash Govindasamy
>            Priority: Major
>             Fix For: 2.0.0
>
>
> The current implementation of RedshiftToS3Transfer is as follows and seems to have referred
to [this post|https://medium.com/carwow-product-engineering/unloading-a-file-from-redshift-to-s3-with-headers-fb707f5480f7].
> {code}
>         unload_query = """
>                         UNLOAD ('SELECT {0}
>                         UNION ALL
>                         SELECT {1} FROM {2}.{3}
>                         ORDER BY 1 DESC')
>                         TO 's3://{4}/{5}/{3}_'
>                         with
>                         credentials 'aws_access_key_id={6};aws_secret_access_key={7}'
>                         {8};
>                         """.format(column_names, column_castings, self.schema, self.table,
>                                    self.s3_bucket, self.s3_key, credentials.access_key,
>                                    credentials.secret_key, unload_options)
> {code}
> {{ORDER BY 1 DESC}} is intended to output the header first, but as [this post|https://stackoverflow.com/questions/24681214/unloading-from-redshift-to-s3-with-headers#answer-26443374]
says, it works only if the first column type is not character (e.g. numeric).
> In addition, this query should be used with PARALLEL OFF option, because without that,
many files are output but only the first one has the header line.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message