airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark S Weiss (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AIRFLOW-1862) redshift_to_s3_operator fails on BOOLEAN column in source table
Date Tue, 28 Nov 2017 23:37:00 GMT

     [ https://issues.apache.org/jira/browse/AIRFLOW-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mark S Weiss updated AIRFLOW-1862:
----------------------------------
    Description: 
The {{airflow/operators/redshift_to_s3_operator}} module generates an `UNLOAD` query using
an SQL fragment for the {{column_castings}} that is generated by this line: https://github.com/apache/incubator-airflow/blob/master/airflow/operators/redshift_to_s3_operator.py#L87

This is a bug, because a {{CAST()}} in Redshift on a column of type {{BOOLEAN}} will raise
an error and abort execution of the SQL statement. The error raised is {{ERROR:  cannot cast
type boolean to character varying}}.

This can be trivially verified in Redshift using the following code:

<noformat>
my_db=# CREATE TABLE temp (BOOLEAN flag);
my_db=# INSERT INTO temp (flag) VALUES(false);
my_db=#SELECT CAST (flag AS text) FROM temp;

ERROR:  cannot cast type boolean to character varying
</noformat>

The solution is to handle the case of {{BOOLEAN}} columns, by:
Modifying the {{columns_query}} here https://github.com/apache/incubator-airflow/blob/master/airflow/operators/redshift_to_s3_operator.py#L75
to also select column {{data_type}}
Modify the expression here to use the {{data_type}} to generate a valid {{CAST}} here: https://github.com/apache/incubator-airflow/blob/master/airflow/operators/redshift_to_s3_operator.py#L87

I have implemented and locally tested this alternative, which I believe is both minimal in
scope and robust in all cases, and will submit a pull request.

<noformat>
        column_castings = (', ').join(["CAST({0} AS text) AS {1}".format(columns[i], columns[i])
                                       if types[i] != 'boolean' else
                                       "CAST(CAST({0} AS SMALLINT)AS text) AS {1}".format(columns[i],
columns[i])
                                       for i in xrange(len(columns))])
</noformat>



  was:
The {{airflow/operators/redshift_to_s3_operator}} module generates an `UNLOAD` query using
an SQL fragment for the `column_castings` that is generated by this line: https://github.com/apache/incubator-airflow/blob/master/airflow/operators/redshift_to_s3_operator.py#L87

This is a bug, because a {{CAST()}} in Redshift on a column of type {{BOOLEAN}} will raise
an error and abort execution of the SQL statement. The error raised is {{ERROR:  cannot cast
type boolean to character varying}}.

This can be trivially verified in Redshift using the following code:

<noformat>
my_db=# CREATE TABLE temp (BOOLEAN flag);
my_db=# INSERT INTO temp (flag) VALUES(false);
my_db=#SELECT CAST (flag AS text) FROM temp;

ERROR:  cannot cast type boolean to character varying
</noformat>

The solution is to handle the case of {{BOOLEAN}} columns, by:
Modifying the {{columns_query}} here https://github.com/apache/incubator-airflow/blob/master/airflow/operators/redshift_to_s3_operator.py#L75
to also select column {{data_type}}
Modify the expression here to use the {{data_type}} to generate a valid {{CAST}} here: https://github.com/apache/incubator-airflow/blob/master/airflow/operators/redshift_to_s3_operator.py#L87

I have implemented and locally tested this alternative, which I believe is both minimal in
scope and robust in all cases, and will submit a pull request.

<noformat>
        column_castings = (', ').join(["CAST({0} AS text) AS {1}".format(columns[i], columns[i])
                                       if types[i] != 'boolean' else
                                       "CAST(CAST({0} AS SMALLINT)AS text) AS {1}".format(columns[i],
columns[i])
                                       for i in xrange(len(columns))])
</noformat>




> redshift_to_s3_operator fails on BOOLEAN column in source table
> ---------------------------------------------------------------
>
>                 Key: AIRFLOW-1862
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1862
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: redshift
>    Affects Versions: Airflow 1.8
>            Reporter: Mark S Weiss
>            Assignee: Mark S Weiss
>              Labels: easyfix
>             Fix For: Airflow 1.8
>
>
> The {{airflow/operators/redshift_to_s3_operator}} module generates an `UNLOAD` query
using an SQL fragment for the {{column_castings}} that is generated by this line: https://github.com/apache/incubator-airflow/blob/master/airflow/operators/redshift_to_s3_operator.py#L87
> This is a bug, because a {{CAST()}} in Redshift on a column of type {{BOOLEAN}} will
raise an error and abort execution of the SQL statement. The error raised is {{ERROR:  cannot
cast type boolean to character varying}}.
> This can be trivially verified in Redshift using the following code:
> <noformat>
> my_db=# CREATE TABLE temp (BOOLEAN flag);
> my_db=# INSERT INTO temp (flag) VALUES(false);
> my_db=#SELECT CAST (flag AS text) FROM temp;
> ERROR:  cannot cast type boolean to character varying
> </noformat>
> The solution is to handle the case of {{BOOLEAN}} columns, by:
> Modifying the {{columns_query}} here https://github.com/apache/incubator-airflow/blob/master/airflow/operators/redshift_to_s3_operator.py#L75
to also select column {{data_type}}
> Modify the expression here to use the {{data_type}} to generate a valid {{CAST}} here:
https://github.com/apache/incubator-airflow/blob/master/airflow/operators/redshift_to_s3_operator.py#L87
> I have implemented and locally tested this alternative, which I believe is both minimal
in scope and robust in all cases, and will submit a pull request.
> <noformat>
>         column_castings = (', ').join(["CAST({0} AS text) AS {1}".format(columns[i],
columns[i])
>                                        if types[i] != 'boolean' else
>                                        "CAST(CAST({0} AS SMALLINT)AS text) AS {1}".format(columns[i],
columns[i])
>                                        for i in xrange(len(columns))])
> </noformat>



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message