flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: adding a new cloud filesystem
Date Thu, 18 Jan 2018 13:08:24 GMT
In fact, there are two S3FileSystemFactory classes, one for Hadoop and
another one for Presto.
In both cases an external file system class is wrapped in Flink's
HadoopFileSystem class [1] [2].

Best, Fabian

[1]
https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-hadoop/src/main/java/org/apache/flink/fs/s3hadoop/S3FileSystemFactory.java#L132
[2]
https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-presto/src/main/java/org/apache/flink/fs/s3presto/S3FileSystemFactory.java#L131

2018-01-18 1:24 GMT+01:00 cw7k <cw7k@yahoo.com.invalid>:

>  Thanks. I'm looking at the s3 example and I can only find the
> S3FileSystemFactory but not the File System implementation (subclass
> of org.apache.flink.core.fs.FileSystem).
> Is that requirement still needed?    On Wednesday, January 17, 2018,
> 3:59:47 PM PST, Fabian Hueske <fhueske@gmail.com> wrote:
>
>  Hi,
>
> please have a look at this doc page [1].
> It describes how to add new file system implementations and also how to
> configure them.
>
> Best, Fabian
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-
> release-1.4/ops/filesystems.html#adding-new-file-system-implementations
>
> 2018-01-18 0:32 GMT+01:00 cw7k <cw7k@yahoo.com.invalid>:
>
> >  Hi, I'm adding support for more cloud storage providers such as Google
> > (gcs://) and Oracle (oci://).
> > I have an oci:// test working based on the s3a:// test but when I try it
> > on an actual Flink job like WordCount, I get this message:
> > "org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could
> not
> > find a file system implementation for scheme 'oci'. The scheme is not
> > directly supported by Flink and no Hadoop file system to support this
> > scheme could be loaded."
> > How do I register new schemes into the file system factory?  Thanks.
> On
> > Tuesday, January 16, 2018, 5:27:31 PM PST, cw7k <cw7k@yahoo.com.INVALID>
> > wrote:
> >
> >  Hi, question on this page:
> > "You need to point Flink to a valid Hadoop configuration..."https://ci.
> > apache.org/projects/flink/flink-docs-release-1.4/ops/
> > deployment/aws.html#s3-simple-storage-service
> > How do you point Flink to the Hadoop config?
> >    On Saturday, January 13, 2018, 4:56:15 AM PST, Till Rohrmann <
> > trohrmann@apache.org> wrote:
> >
> >  Hi,
> >
> > the flink-connector-filesystem contains the BucketingSink which is a
> > connector with which you can write your data to a file system. It
> provides
> > exactly once processing guarantees and allows to write data to different
> > buckets [1].
> >
> > The flink-filesystem module contains different file system
> implementations
> > (like mapr fs, hdfs or s3). If you want to use, for example, s3 file
> > system, then there is the flink-s3-fs-hadoop and flink-s3-fs-presto
> module.
> >
> > So if you want to write your data to s3 using the BucketingSink, then you
> > have to add flink-connector-filesystem for the BucketingSink as well as a
> > s3 file system implementations (e.g. flink-s3-fs-hadoop or
> > flink-s3-fs-presto).
> >
> > Usually, there should be no need to change Flink's filesystem
> > implementations. If you want to add a new connector, then this would go
> to
> > flink-connectors or to Apache Bahir [2].
> >
> > [1]
> > https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/
> > filesystem_sink.html
> >
> > [2]
> > https://ci.apache.org/projects/flink/flink-docs-
> > master/dev/connectors/index.html#connectors-in-apache-bahir
> >
> > Cheers,
> > Till
> >
> > On Fri, Jan 12, 2018 at 7:22 PM, cw7k <cw7k@yahoo.com.invalid> wrote:
> >
> > > Hi, I'm trying to understand the difference between the
> flink-filesystem
> > > and flink-connector-filesystem.  How is each intended to be used?
> > > If adding support for a different storage provider that supports HDFS,
> > > should additions be made to one or the other, or both?  Thanks.
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message