airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anand Kumar (Jira)" <>
Subject [jira] [Commented] (AIRFLOW-5224) gcs_to_bq.GoogleCloudStorageToBigQueryOperator - Specify Encoding for BQ ingestion
Date Thu, 19 Sep 2019 17:29:00 GMT


Anand Kumar commented on AIRFLOW-5224:

[~kaxilnaik] That's right and it doesn't automatically take care of it and a encoding specification
in this operator would help.

Which version is this fix being taken care of ?

> gcs_to_bq.GoogleCloudStorageToBigQueryOperator - Specify Encoding for BQ ingestion
> ----------------------------------------------------------------------------------
>                 Key: AIRFLOW-5224
>                 URL:
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: DAG, gcp
>    Affects Versions: 1.10.0
>         Environment: airflow software platform
>            Reporter: Anand Kumar
>            Priority: Blocker
> Hi,
> The current business project we are enabling has been built completely on GCP components
with composer with airflow being one of the key process. We have built various data pipelines
using airflow for multiple work-streams where data is being ingested from gcs bucket to Big
> Based on the recent updates on Google BQ infra end, there seems to be some tightened
validations on UTF-8 characters which has resulted in mutiple failures of our existing business
> On further analysis we found out that while ingesting data to BQ from a Google bucket
the encoding needs to be explicitly specified going forward but the below operator currently
doesn't  supply any params to specify explicit encoding
> _*gcs_to_bq.GoogleCloudStorageToBigQueryOperator*_
>  Could someone please treat this as a priority and help us with a fix to bring us back
in BAU mode

This message was sent by Atlassian Jira

View raw message