flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lasse Dalegaard (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-3637) Change RollingSink Writer interface to allow wider range of outputs
Date Fri, 18 Mar 2016 13:46:33 GMT
Lasse Dalegaard created FLINK-3637:

             Summary: Change RollingSink Writer interface to allow wider range of outputs
                 Key: FLINK-3637
                 URL: https://issues.apache.org/jira/browse/FLINK-3637
             Project: Flink
          Issue Type: Improvement
          Components: Streaming Connectors
            Reporter: Lasse Dalegaard

Currently the RollingSink Writer interface only works with FSDataOutputStreams, which precludes
it from being used with some existing libraries like Apache ORC and Parquet.

To fix this, a new Writer interface can be created, which receives FileSystem and Path objects,
instead of FSDataOutputStream.

To ensure exactly-once semantics, the Writer interface must also be extended so that the current
write-offset can be retrieved at checkpointing time. For formats like ORC this requires a
footer to be written, before the offset is returned. Checkpointing already calls flush on
the writer, but either flush needs to return the current length of the output file, or alternatively
a new method has to be added for this.

The existing Writer interface can be recreated with a wrapper on top of the new Writer interface.
The existing code that manages the FSDataOutputStream can then be moved into this new wrapper.

This message was sent by Atlassian JIRA

View raw message