airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Secada (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AIRFLOW-1750) GoogleCloudStorageToBigQueryOperator 404 HttpError
Date Mon, 23 Oct 2017 21:01:06 GMT

     [ https://issues.apache.org/jira/browse/AIRFLOW-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mark Secada updated AIRFLOW-1750:
---------------------------------
    Description: 
I'm trying to write a DAG which uploads JSON files to GoogleCloudStorage and then moves them
to BigQuery. I was able to upload these files to GoogleCloudStorage, but when I run this second
task, I get a 404 HttpError. The error looks like this:

{code:bash}
ERROR - <HttpError 404 when requesting https://www.googleapis.com/bigquery/v2/projects//jobs?alt=json
returned "Not Found">
Traceback (most recent call last):
  File "/Users/marksecada/anaconda/lib/python2.7/site-packages/airflow/models.py", line 1374,
in run
    result = task_copy.execute(context=context)
  File "/Users/marksecada/anaconda/lib/python2.7/site-packages/airflow/contrib/operators/gcs_to_bq.py",
line 153, in execute
    schema_update_options=self.schema_update_options)
  File "/Users/marksecada/anaconda/lib/python2.7/site-packages/airflow/contrib/hooks/bigquery_hook.py",
line 476, in run_load
    return self.run_with_configuration(configuration)
  File "/Users/marksecada/anaconda/lib/python2.7/site-packages/airflow/contrib/hooks/bigquery_hook.py",
line 498, in run_with_configuration
    .insert(projectId=self.project_id, body=job_data) \
  File "/Users/marksecada/anaconda/lib/python2.7/site-packages/oauth2client/util.py", line
135, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/Users/marksecada/anaconda/lib/python2.7/site-packages/googleapiclient/http.py", line
838, in execute
    raise HttpError(resp, content, uri=self.uri)
{code}

My code's here:

{code:python}
// Some comments here
t3 = GoogleCloudStorageToBigQueryOperator(
        task_id='move_'+source+'_from_gcs_to_bq',
        bucket='mybucket',
        source_objects=['news/latest_headline_'+source+'.json'],
        destination_project_dataset_table='mydataset.latest_news_headlines',
        schema_object='news/latest_headline_'+source+'.json',
        source_format='NEWLINE_DELIMITED_JSON',
        write_disposition='WRITE_APPEND'
        dag=dag)
{code}



  was:
I'm trying to write a DAG which uploads JSON files to GoogleCloudStorage and then moves them
to BigQuery. I was able to upload these files to GoogleCloudStorage, but when I run this second
task, I get a 404 HttpError. The error looks like this:

{bash}
ERROR - <HttpError 404 when requesting https://www.googleapis.com/bigquery/v2/projects//jobs?alt=json
returned "Not Found">
Traceback (most recent call last):
  File "/Users/marksecada/anaconda/lib/python2.7/site-packages/airflow/models.py", line 1374,
in run
    result = task_copy.execute(context=context)
  File "/Users/marksecada/anaconda/lib/python2.7/site-packages/airflow/contrib/operators/gcs_to_bq.py",
line 153, in execute
    schema_update_options=self.schema_update_options)
  File "/Users/marksecada/anaconda/lib/python2.7/site-packages/airflow/contrib/hooks/bigquery_hook.py",
line 476, in run_load
    return self.run_with_configuration(configuration)
  File "/Users/marksecada/anaconda/lib/python2.7/site-packages/airflow/contrib/hooks/bigquery_hook.py",
line 498, in run_with_configuration
    .insert(projectId=self.project_id, body=job_data) \
  File "/Users/marksecada/anaconda/lib/python2.7/site-packages/oauth2client/util.py", line
135, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/Users/marksecada/anaconda/lib/python2.7/site-packages/googleapiclient/http.py", line
838, in execute
    raise HttpError(resp, content, uri=self.uri)
{bash}

My code's here:

{code:python}
// Some comments here
t3 = GoogleCloudStorageToBigQueryOperator(
        task_id='move_'+source+'_from_gcs_to_bq',
        bucket='mybucket',
        source_objects=['news/latest_headline_'+source+'.json'],
        destination_project_dataset_table='mydataset.latest_news_headlines',
        schema_object='news/latest_headline_'+source+'.json',
        source_format='NEWLINE_DELIMITED_JSON',
        write_disposition='WRITE_APPEND'
        dag=dag)
{code}




> GoogleCloudStorageToBigQueryOperator 404 HttpError
> --------------------------------------------------
>
>                 Key: AIRFLOW-1750
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1750
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: gcp
>    Affects Versions: Airflow 1.8
>         Environment: Python 2.7.13
>            Reporter: Mark Secada
>             Fix For: Airflow 1.8
>
>
> I'm trying to write a DAG which uploads JSON files to GoogleCloudStorage and then moves
them to BigQuery. I was able to upload these files to GoogleCloudStorage, but when I run this
second task, I get a 404 HttpError. The error looks like this:
> {code:bash}
> ERROR - <HttpError 404 when requesting https://www.googleapis.com/bigquery/v2/projects//jobs?alt=json
returned "Not Found">
> Traceback (most recent call last):
>   File "/Users/marksecada/anaconda/lib/python2.7/site-packages/airflow/models.py", line
1374, in run
>     result = task_copy.execute(context=context)
>   File "/Users/marksecada/anaconda/lib/python2.7/site-packages/airflow/contrib/operators/gcs_to_bq.py",
line 153, in execute
>     schema_update_options=self.schema_update_options)
>   File "/Users/marksecada/anaconda/lib/python2.7/site-packages/airflow/contrib/hooks/bigquery_hook.py",
line 476, in run_load
>     return self.run_with_configuration(configuration)
>   File "/Users/marksecada/anaconda/lib/python2.7/site-packages/airflow/contrib/hooks/bigquery_hook.py",
line 498, in run_with_configuration
>     .insert(projectId=self.project_id, body=job_data) \
>   File "/Users/marksecada/anaconda/lib/python2.7/site-packages/oauth2client/util.py",
line 135, in positional_wrapper
>     return wrapped(*args, **kwargs)
>   File "/Users/marksecada/anaconda/lib/python2.7/site-packages/googleapiclient/http.py",
line 838, in execute
>     raise HttpError(resp, content, uri=self.uri)
> {code}
> My code's here:
> {code:python}
> // Some comments here
> t3 = GoogleCloudStorageToBigQueryOperator(
>         task_id='move_'+source+'_from_gcs_to_bq',
>         bucket='mybucket',
>         source_objects=['news/latest_headline_'+source+'.json'],
>         destination_project_dataset_table='mydataset.latest_news_headlines',
>         schema_object='news/latest_headline_'+source+'.json',
>         source_format='NEWLINE_DELIMITED_JSON',
>         write_disposition='WRITE_APPEND'
>         dag=dag)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message