flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Rohrmann (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-5789) Make Bucketing Sink independent of Hadoop's FileSystem
Date Tue, 07 Aug 2018 07:57:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16571227#comment-16571227
] 

Till Rohrmann commented on FLINK-5789:
--------------------------------------

Yes exactly [~mingleizhang].

> Make Bucketing Sink independent of Hadoop's FileSystem
> ------------------------------------------------------
>
>                 Key: FLINK-5789
>                 URL: https://issues.apache.org/jira/browse/FLINK-5789
>             Project: Flink
>          Issue Type: Bug
>          Components: Streaming Connectors
>    Affects Versions: 1.1.4, 1.2.0
>            Reporter: Stephan Ewen
>            Priority: Major
>
> The {{BucketingSink}} is hard wired to Hadoop's FileSystem, bypassing Flink's file system
abstraction.
> This causes several issues:
>   - The bucketing sink will behave different than other file sinks with respect to configuration
>   - Directly supported file systems (not through hadoop) like the MapR File System does
not work in the same way with the BuketingSink as other file systems
>   - The previous point is all the more problematic in the effort to make Hadoop an optional
dependency and with in other stacks (Mesos, Kubernetes, AWS, GCE, Azure) with ideally no Hadoop
dependency.
> We should port the {{BucketingSink}} to use Flink's FileSystem classes.
> To support the *truncate* functionality that is needed for the exactly-once semantics
of the Bucketing Sink, we should extend Flink's FileSystem abstraction to have the methods
>   - {{boolean supportsTruncate()}}
>   - {{void truncate(Path, long)}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message