beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jacob Marble (JIRA)" <>
Subject [jira] [Commented] (BEAM-2500) Add support for S3 as a Apache Beam FileSystem
Date Thu, 14 Sep 2017 18:09:03 GMT


Jacob Marble commented on BEAM-2500:

Chamikara, thanks for your comment. I'll switch my implementation to multipart after I have
something working, just got the simple 5GB version written. I'll also give closer consideration
to the credentials question after I have the harder parts complete. For now, just using flags
via PipelineOptions.

So I have completed enough of this to test it out, except one problem. S3 requires the content
length before writing any data, or else the client buffers the entire content in memory before
writing. I have added contentLength to my S3CreateOptions, but how to set that value before
S3FileSystem.create() is called?

> Add support for S3 as a Apache Beam FileSystem
> ----------------------------------------------
>                 Key: BEAM-2500
>                 URL:
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-extensions
>            Reporter: Luke Cwik
>            Priority: Minor
>         Attachments: hadoop_fs_patch.patch
> Note that this is for providing direct integration with S3 as an Apache Beam FileSystem.
> There is already support for using the Hadoop S3 connector by depending on the Hadoop
File System module[1], configuring HadoopFileSystemOptions[2] with a S3 configuration[3].
> 1:
> 2:
> 3:

This message was sent by Atlassian JIRA

View raw message