airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <>
Subject [jira] [Commented] (AIRFLOW-2053) BigQuery Hook bug when data doesn't contain quoted values
Date Fri, 02 Feb 2018 08:32:00 GMT


ASF subversion and git services commented on AIRFLOW-2053:

Commit fd4360b9f0954b3dd4a960153178a06112f05a33 in incubator-airflow's branch refs/heads/master
from [~kaxilnaik]
[;h=fd4360b ]

[AIRFLOW-2053] Fix quote character bug in BQ hook

Modified the condition to check if the
quote_character is set. This will allow to set
`quote_character` as empty string when the data
doesn't contain quoted sections.

Closes #2996 from kaxil/bq_hook_quote_fix

> BigQuery Hook bug when data doesn't contain quoted values
> ---------------------------------------------------------
>                 Key: AIRFLOW-2053
>                 URL:
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: gcp
>    Affects Versions: 1.9.0, 1.8.2
>            Reporter: Kaxil Naik
>            Assignee: Kaxil Naik
>            Priority: Minor
>             Fix For: 2.0.0
> The BigQuery API states [here|]
that :
> {quote}The value that is used to quote data sections in a CSV file. BigQuery converts
the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split
the data in its raw, binary state. The default value is a double-quote ('"'). If your data
does not contain quoted sections, set the property value to an empty string. {quote}
> But the [current implementation|]
`run_load ` in BigQuery hook has incorrect check to include `quote_character`.
> The code currently is:
> {code:python}
>         if 'fieldDelimiter' not in src_fmt_configs:
>             src_fmt_configs['fieldDelimiter'] = field_delimiter
>         if quote_character:
>             src_fmt_configs['quote'] = quote_character
>         if allow_quoted_newlines:
>             src_fmt_configs['allowQuotedNewlines'] = allow_quoted_newlines
> {code}
> If my data doesn't have quote characters as per BQ API docs I need to put `quote=''`
i.e empty string. The above condition `if quote_character:` will return false for an empty
string. Hence, I get the following error:
> {code:json}
> {'message': 'Error detected while parsing row starting at position: 0. Error: Data between
close double quote (") and field separator.', 'reason': 'invalid'}
> {code}
> So, the condition should be :
> {code:python}
>         if quote_character is not None:
>             src_fmt_configs['quote'] = quote_character
> {code}

This message was sent by Atlassian JIRA

View raw message