airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Riccomini (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-1750) GoogleCloudStorageToBigQueryOperator 404 HttpError
Date Mon, 23 Oct 2017 21:12:00 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215848#comment-16215848
] 

Chris Riccomini commented on AIRFLOW-1750:
------------------------------------------

It looks to me like the project id is not being properly set. Have you checked your hook definition,
service account, etc? The URL listed in the stack trace has two slashes after `projects`,
indicating that no project_id was set.

> GoogleCloudStorageToBigQueryOperator 404 HttpError
> --------------------------------------------------
>
>                 Key: AIRFLOW-1750
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1750
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: gcp
>    Affects Versions: Airflow 1.8
>         Environment: Python 2.7.13
>            Reporter: Mark Secada
>             Fix For: Airflow 1.8
>
>
> I'm trying to write a DAG which uploads JSON files to GoogleCloudStorage and then moves
them to BigQuery. I was able to upload these files to GoogleCloudStorage, but when I run this
second task, I get a 404 HttpError. The error looks like this:
> {code:bash}
> ERROR - <HttpError 404 when requesting https://www.googleapis.com/bigquery/v2/projects//jobs?alt=json
returned "Not Found">
> Traceback (most recent call last):
>   File "/Users/myname/anaconda/lib/python2.7/site-packages/airflow/models.py", line 1374,
in run
>     result = task_copy.execute(context=context)
>   File "/Users/myname/anaconda/lib/python2.7/site-packages/airflow/contrib/operators/gcs_to_bq.py",
line 153, in execute
>     schema_update_options=self.schema_update_options)
>   File "/Users/myname/anaconda/lib/python2.7/site-packages/airflow/contrib/hooks/bigquery_hook.py",
line 476, in run_load
>     return self.run_with_configuration(configuration)
>   File "/Users/myname/anaconda/lib/python2.7/site-packages/airflow/contrib/hooks/bigquery_hook.py",
line 498, in run_with_configuration
>     .insert(projectId=self.project_id, body=job_data) \
>   File "/Users/myname/anaconda/lib/python2.7/site-packages/oauth2client/util.py", line
135, in positional_wrapper
>     return wrapped(*args, **kwargs)
>   File "/Users/myname/anaconda/lib/python2.7/site-packages/googleapiclient/http.py",
line 838, in execute
>     raise HttpError(resp, content, uri=self.uri)
> {code}
> My code for the task is here:
> {code:python}
> // Some comments here
> t3 = GoogleCloudStorageToBigQueryOperator(
>         task_id='move_'+source+'_from_gcs_to_bq',
>         bucket='mybucket',
>         source_objects=['news/latest_headline_'+source+'.json'],
>         destination_project_dataset_table='mydataset.latest_news_headlines',
>         schema_object='news/latest_headline_'+source+'.json',
>         source_format='NEWLINE_DELIMITED_JSON',
>         write_disposition='WRITE_APPEND'
>         dag=dag)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message