flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cw7k <c...@yahoo.com.INVALID>
Subject Re: adding a new cloud filesystem
Date Fri, 19 Jan 2018 00:43:37 GMT
 Ok, I have the factory working in the WordCount example.  I had to move the factory code
and META-INF into the WordCount project.
For general Flink jobs, I'm assuming that the goal would be to be able to import the factory
from the job itself instead of needing to copy the factory .java file into each project? 
If so, any guidelines on how to do that?    On Thursday, January 18, 2018, 10:53:32 AM PST,
cw7k <cw7k@yahoo.com.INVALID> wrote:  
 
  Hi, just a bit more info, I have a test function working using oci://, based on the S3 test:
https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-hadoop/src/test/java/org/apache/flink/fs/s3hadoop/HadoopS3FileSystemITCase.java#L169 
However, when I try to get the WordCount example's WriteAsText to write to my new filesystem:
https://github.com/apache/flink/blob/master/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/wordcount/WordCount.java#L82

that's where I got the "Could not find a file system implementation" error mentioned earlier.  

    On Thursday, January 18, 2018, 10:22:57 AM PST, cw7k <cw7k@yahoo.com.INVALID>
wrote:  
 
  Thanks.  I now have the 3 requirements fulfilled but the scheme isn't being loaded; I
get this error:
"Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find
a file system implementation for scheme 'oci'. The scheme is not directly supported by Flink
and no Hadoop file system to support this scheme could be loaded."
What's the best way to debug the loading of the schemes/filesystems by the ServiceLoader? 
  On Thursday, January 18, 2018, 5:09:10 AM PST, Fabian Hueske <fhueske@gmail.com>
wrote:  
 
 In fact, there are two S3FileSystemFactory classes, one for Hadoop and
another one for Presto.
In both cases an external file system class is wrapped in Flink's
HadoopFileSystem class [1] [2].

Best, Fabian

[1]
https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-hadoop/src/main/java/org/apache/flink/fs/s3hadoop/S3FileSystemFactory.java#L132
[2]
https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-presto/src/main/java/org/apache/flink/fs/s3presto/S3FileSystemFactory.java#L131

2018-01-18 1:24 GMT+01:00 cw7k <cw7k@yahoo.com.invalid>:

>  Thanks. I'm looking at the s3 example and I can only find the
> S3FileSystemFactory but not the File System implementation (subclass
> of org.apache.flink.core.fs.FileSystem).
> Is that requirement still needed?    On Wednesday, January 17, 2018,
> 3:59:47 PM PST, Fabian Hueske <fhueske@gmail.com> wrote:
>
>  Hi,
>
> please have a look at this doc page [1].
> It describes how to add new file system implementations and also how to
> configure them.
>
> Best, Fabian
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-
> release-1.4/ops/filesystems.html#adding-new-file-system-implementations
>
> 2018-01-18 0:32 GMT+01:00 cw7k <cw7k@yahoo.com.invalid>:
>
> >  Hi, I'm adding support for more cloud storage providers such as Google
> > (gcs://) and Oracle (oci://).
> > I have an oci:// test working based on the s3a:// test but when I try it
> > on an actual Flink job like WordCount, I get this message:
> > "org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could
> not
> > find a file system implementation for scheme 'oci'. The scheme is not
> > directly supported by Flink and no Hadoop file system to support this
> > scheme could be loaded."
> > How do I register new schemes into the file system factory?  Thanks.
> On
> > Tuesday, January 16, 2018, 5:27:31 PM PST, cw7k <cw7k@yahoo.com.INVALID>
> > wrote:
> >
> >  Hi, question on this page:
> > "You need to point Flink to a valid Hadoop configuration..."https://ci.
> > apache.org/projects/flink/flink-docs-release-1.4/ops/
> > deployment/aws.html#s3-simple-storage-service
> > How do you point Flink to the Hadoop config?
> >    On Saturday, January 13, 2018, 4:56:15 AM PST, Till Rohrmann <
> > trohrmann@apache.org> wrote:
> >
> >  Hi,
> >
> > the flink-connector-filesystem contains the BucketingSink which is a
> > connector with which you can write your data to a file system. It
> provides
> > exactly once processing guarantees and allows to write data to different
> > buckets [1].
> >
> > The flink-filesystem module contains different file system
> implementations
> > (like mapr fs, hdfs or s3). If you want to use, for example, s3 file
> > system, then there is the flink-s3-fs-hadoop and flink-s3-fs-presto
> module.
> >
> > So if you want to write your data to s3 using the BucketingSink, then you
> > have to add flink-connector-filesystem for the BucketingSink as well as a
> > s3 file system implementations (e.g. flink-s3-fs-hadoop or
> > flink-s3-fs-presto).
> >
> > Usually, there should be no need to change Flink's filesystem
> > implementations. If you want to add a new connector, then this would go
> to
> > flink-connectors or to Apache Bahir [2].
> >
> > [1]
> > https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/
> > filesystem_sink.html
> >
> > [2]
> > https://ci.apache.org/projects/flink/flink-docs-
> > master/dev/connectors/index.html#connectors-in-apache-bahir
> >
> > Cheers,
> > Till
> >
> > On Fri, Jan 12, 2018 at 7:22 PM, cw7k <cw7k@yahoo.com.invalid> wrote:
> >
> > > Hi, I'm trying to understand the difference between the
> flink-filesystem
> > > and flink-connector-filesystem.  How is each intended to be used?
> > > If adding support for a different storage provider that supports HDFS,
> > > should additions be made to one or the other, or both?  Thanks.
> >
>
>
     
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message