flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mohamed Amine ABDESSEMED (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-2672) Add partitioned output format to HDFS RollingSink
Date Tue, 15 Sep 2015 08:42:46 GMT
Mohamed Amine ABDESSEMED created FLINK-2672:

             Summary: Add partitioned output format to HDFS RollingSink
                 Key: FLINK-2672
                 URL: https://issues.apache.org/jira/browse/FLINK-2672
             Project: Flink
          Issue Type: Improvement
          Components: Streaming Connectors
    Affects Versions: 0.10
            Reporter: Mohamed Amine ABDESSEMED
            Priority: Minor

An interesting use case of the HDFS Sink is to dispatch data into multiple directories depending
of attributes present in the source data.
For example, for some data with a timestamp and a status fields, we want to write it into
different directories using a pattern like : /somepath/%{timestamp}/%{status}

The expected results are somethings like: 

To support this functionality the bucketing and checkpointing logics need to be changed. 

Note: For now, this can be done using the current version of the Rolling HDFS Sink with the
help of splitting data streams and having multiple HDFS sinks.

This message was sent by Atlassian JIRA

View raw message