airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Bodley (JIRA)" <j...@apache.org>
Subject [jira] [Work started] (AIRFLOW-179) DbApiHook string serialization fails when string contains non-ASCII characters
Date Thu, 26 May 2016 05:31:12 GMT

     [ https://issues.apache.org/jira/browse/AIRFLOW-179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Work on AIRFLOW-179 started by John Bodley.
-------------------------------------------
> DbApiHook string serialization fails when string contains non-ASCII characters
> ------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-179
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-179
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: hooks
>            Reporter: John Bodley
>            Assignee: John Bodley
>
> The DbApiHook.insert_rows(...) method tries to serialize all values to strings using
the ASCII codec,  this is problematic if the cell contains non-ASCII characters, i.e.
> >>> from airflow.hooks import DbApiHook
> >>> DbApiHook._serialize_cell('Nguyễn Tấn Dũng')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py", line 196,
in _serialize_cell
>     return "'" + str(cell).replace("'", "''") + "'"
>   File "/usr/local/lib/python2.7/dist-packages/future/types/newstr.py", line 102, in
__new__
>     return super(newstr, cls).__new__(cls, value)
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 4: ordinal not in
range(128)
> Rather than manually trying to serialize values to an ASCII string one should try to
serialize the value to string using the character set of the corresponding target database
leveraging the connection to mutate an object to the SQL string literal.
> Note an exception should still be thrown if the target encoding is not compatible with
the source encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message