beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luke Cwik (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-2500) Add support for S3 as a Apache Beam FileSystem
Date Thu, 14 Sep 2017 18:08:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16166735#comment-16166735
] 

Luke Cwik commented on BEAM-2500:
---------------------------------

Performing the multipart download/upload will become important as 5GiBs has limited use but
start off implementing the simpler thing as multipart upload/download can come later.

http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectCOPY.html
Amazon supports an efficient copy operation if you specify "x-amz-copy-source" as a header
where you don't need to upload the bytes and it just adds some metadata that points to the
same set of bytes. Depending on which Amazon S3 Java library you use, they may or may not
expose this flexibility.

> Add support for S3 as a Apache Beam FileSystem
> ----------------------------------------------
>
>                 Key: BEAM-2500
>                 URL: https://issues.apache.org/jira/browse/BEAM-2500
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-extensions
>            Reporter: Luke Cwik
>            Priority: Minor
>         Attachments: hadoop_fs_patch.patch
>
>
> Note that this is for providing direct integration with S3 as an Apache Beam FileSystem.
> There is already support for using the Hadoop S3 connector by depending on the Hadoop
File System module[1], configuring HadoopFileSystemOptions[2] with a S3 configuration[3].
> 1: https://github.com/apache/beam/tree/master/sdks/java/io/hadoop-file-system
> 2: https://github.com/apache/beam/blob/master/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptions.java#L53
> 3: https://wiki.apache.org/hadoop/AmazonS3



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message