airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Riccomini (JIRA)" <>
Subject [jira] [Updated] (AIRFLOW-611) BigQuery Hooks and Operators "source_format" error
Date Fri, 04 Nov 2016 18:22:00 GMT


Chris Riccomini updated AIRFLOW-611:
    Component/s: gcp

> BigQuery Hooks and Operators "source_format" error
> --------------------------------------------------
>                 Key: AIRFLOW-611
>                 URL:
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: gcp
>            Reporter: Giovanni Briggs
>            Priority: Minor
> Found an issue with the *source_format* parameter for the GoogleCloudStorageToBigQueryOperator.
> I was trying to upload a JSON file from GCS to BQ and was using the value *"JSON"* for
*source_format*, assuming that this would work.  The upload process started, but then came
back with an error saying:
> {code:javascript}
> {'message': 'Error detected while parsing row starting at position: 0. Error: Data between
close double quote (") and field separator.', 'reason': 'invalid'}
> {code}
> There is nothing wrong with the JSON format of the doc, so I went and looked at the job
description on BigQuery and saw that there was no "Source Format" entry.  When I've successfully
uploaded CSV files, the "Source Format" entry is present and says "CSV."
> According to Google's docs for [source format |],
acceptable values are: "CSV", "NEWLINE_DELIMTED_JSON", "AVRO" and "GOOGLE_SHEETS."  However,
BigQuery doesn't raise an error if you pass a format not represented in that list (such as
"JSON").  Instead, it looks like BigQuery assumes you mean CSV and tries to parse the file
as a CSV file which results in a completely different error.
> Not sure what the appropriate fix is (or if there even is one).  At least having some
additional documentation for the BigQuery hook and operators that points to the list of available
values would be helpful.  Otherwise, BigQuery's error leads you to believe that there is something
wrong with the format of your data which is different than having something wrong with the
setup of the API call.

This message was sent by Atlassian JIRA

View raw message