airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kamil Bregula (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-3503) GoogleCloudStorageHook delete return success when nothing was done
Date Mon, 05 Aug 2019 12:53:00 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-3503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900060#comment-16900060
] 

Kamil Bregula commented on AIRFLOW-3503:
----------------------------------------

> I expect the function to fail and return something like "file was not found" if there
is nothing to delete Or let the user decide with specific flag if he wants the function to
fail or success if files were not found.

Operators should be idempotent. If the file does not exist because it was previously deleted,
then restarting the task should not raise an error.

> GoogleCloudStorageHook  delete return success when nothing was done
> -------------------------------------------------------------------
>
>                 Key: AIRFLOW-3503
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3503
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: gcp
>    Affects Versions: 1.10.1
>            Reporter: lot
>            Assignee: Yohei Onishi
>            Priority: Major
>              Labels: gcp, gcs, hooks
>
> I'm loading files to BigQuery from Storage using:
>  
> {{gcs_export_uri = BQ_TABLE_NAME + '/' + EXEC_TIMESTAMP_PATH + '/*' gcs_to_bigquery_op
= GoogleCloudStorageToBigQueryOperator( dag=dag, task_id='load_products_to_BigQuery', bucket=GCS_BUCKET_ID,
destination_project_dataset_table=table_name_template, source_format='NEWLINE_DELIMITED_JSON',
source_objects=[gcs_export_uri], src_fmt_configs=\{'ignoreUnknownValues': True}, create_disposition='CREATE_IF_NEEDED',
write_disposition='WRITE_TRUNCATE', skip_leading_rows = 1, google_cloud_storage_conn_id=CONNECTION_ID,
bigquery_conn_id=CONNECTION_ID)}}
>  
> After that I want to delete the files so I do:
> {{def delete_folder():}}
> {{    """}}
> {{    Delete files Google cloud storage}}
> {{    """}}
> {{    hook = GoogleCloudStorageHook(}}
> {{            google_cloud_storage_conn_id=CONNECTION_ID)}}
> {{    hook.delete(}}
> {{        bucket=GCS_BUCKET_ID,}}
> {{        object=gcs_export_uri)}}
>  
>  
> {{This runs with PythonOperator.}}
> {{The task marked as Success even though nothing was deleted.}}
> {{Log:}}
> [2018-12-12 11:31:29,247] \{base_task_runner.py:98} INFO - Subtask: [2018-12-12 11:31:29,247]
\{transport.py:151} INFO - Attempting refresh to obtain initial access_token [2018-12-12 11:31:29,249]
\{base_task_runner.py:98} INFO - Subtask: [2018-12-12 11:31:29,249] \{client.py:795} INFO
- Refreshing access_token [2018-12-12 11:31:29,584] \{base_task_runner.py:98} INFO - Subtask:
[2018-12-12 11:31:29,583] \{python_operator.py:90} INFO - Done. Returned value was: None
>  
>  
> I expect the function to fail and return something like "file was not found" if there
is nothing to delete Or let the user decide with specific flag if he wants the function to
fail or success if files were not found.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Mime
View raw message