airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (Jira)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-2842) GCS rsync operator
Date Sat, 07 Sep 2019 15:21:00 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924907#comment-16924907
] 

ASF GitHub Bot commented on AIRFLOW-2842:
-----------------------------------------

potiuk commented on pull request #6011: [AIRFLOW-2842] Add GoogleCloudStorageSynchronizeBuckets
operator
URL: https://github.com/apache/airflow/pull/6011
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> GCS rsync operator
> ------------------
>
>                 Key: AIRFLOW-2842
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2842
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: gcp
>            Reporter: Vikram Oberoi
>            Priority: Major
>              Labels: gcs
>
> The GoogleCloudStorageToGoogleCloudStorageOperator supports copying objects from one
bucket to another using a wildcard.
> As long you don't delete anything in the source bucket, the destination bucket will end
up synchronized on every run.
> However, each object gets copied over even if it exists at the destination, which makes
this operation inefficient, time-consuming, and potentially costly.
> I'd love an operator that behaves like `gsutil rsync` for when I need to synchronize
two buckets, supporting `gsutil rsync -d` behavior as well.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Mime
View raw message