airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-179) DbApiHook string serialization fails when string contains non-ASCII characters
Date Fri, 27 May 2016 19:10:12 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304585#comment-15304585
] 

ASF subversion and git services commented on AIRFLOW-179:
---------------------------------------------------------

Commit 8f63640584ca2dcd15bcd361d1f9a0d995bad315 in incubator-airflow's branch refs/heads/master
from [~artwr]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=8f63640 ]

Revert "[AIRFLOW-179] DbApiHook string serialization fails when string contains non-ASCII
characters"

This reverts commit 87b4b8fa19cb660317198d74f6d51fdde0a7e067.

Reverting as the method used in the dbapi hook is actually package
specific to MySQLdb and would break the sqlite and mssql hooks.


> DbApiHook string serialization fails when string contains non-ASCII characters
> ------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-179
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-179
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: hooks
>            Reporter: John Bodley
>            Assignee: John Bodley
>             Fix For: Airflow 1.8
>
>
> The DbApiHook.insert_rows(...) method tries to serialize all values to strings using
the ASCII codec,  this is problematic if the cell contains non-ASCII characters, i.e.
>     >>> from airflow.hooks import DbApiHook
>     >>> DbApiHook._serialize_cell('Nguyễn Tấn Dũng')
>     Traceback (most recent call last):
>       File "<stdin>", line 1, in <module>
>       File "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py", line
196, in _serialize_cell
>         return "'" + str(cell).replace("'", "''") + "'"
>       File "/usr/local/lib/python2.7/dist-packages/future/types/newstr.py", line 102,
in __new__
>         return super(newstr, cls).__new__(cls, value)
>     UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 4: ordinal not
in range(128)
> Rather than manually trying to serialize and escape values to an ASCII string one should
try to serialize the value to string using the character set of the corresponding target database
leveraging the connection to mutate the object to the SQL string literal.
> Additionally the escaping logic for single quotes (') within the _serialize_cell method
seems wrong, i.e. 
>     str(cell).replace("'", "''")
> would escape the string "you're" to be "'you''ve'" as opposed to "'you\'ve'".
> Note an exception should still be thrown if the target encoding is not compatible with
the source encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message