beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Guillaume Balaine (JIRA)" <>
Subject [jira] [Commented] (BEAM-2500) Add support for S3 as a Apache Beam FileSystem
Date Fri, 11 Aug 2017 08:51:00 GMT


Guillaume Balaine commented on BEAM-2500:

Thanks, that's fine really, the only trouble was that I had to dig in some example code to
find it out because no stacktraces pop in Beam. It's just that resolving a ResourceId with
such a path from another folder gives you an incomplete URI, where the base path is truncated
like :

(s3a://mybucket/myfolder/somefilename.fmt).resolve(somefilename-12:30-13:30.fmt) -> ResourceId{URI{somefilename-12:30-13:30.fmt}}

(s3a://mybucket/myfolder/somefilename.fmt).resolve(somefilename-12.30-13.30.fmt) -> ResourceId{URI{instead
of s3a://mybucket/myfolder/somefilename-12.30-13.30.fmt}} 

so people need to be aware of their file name policies in beam.

On another note, reads don't work because S3 input streams don't implement ByteBufferReadable
as you mentionned here
so I guess fixing that would be enough to resolve this issue.

> Add support for S3 as a Apache Beam FileSystem
> ----------------------------------------------
>                 Key: BEAM-2500
>                 URL:
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-extensions
>            Reporter: Luke Cwik
>            Priority: Minor
> Note that this is for providing direct integration with S3 as an Apache Beam FileSystem.
> There is already support for using the Hadoop S3 connector by depending on the Hadoop
File System module[1], configuring HadoopFileSystemOptions[2] with a S3 configuration[3].
> 1:
> 2:
> 3:

This message was sent by Atlassian JIRA

View raw message