flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-2583) Add Stream Sink For Rolling HDFS Files
Date Thu, 03 Sep 2015 09:29:45 GMT

    [ https://issues.apache.org/jira/browse/FLINK-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728745#comment-14728745

ASF GitHub Bot commented on FLINK-2583:

Github user rmetzger commented on a diff in the pull request:

    --- Diff: docs/apis/streaming_guide.md ---
    @@ -1836,6 +1837,110 @@ More about information about Elasticsearch can be found [here](https://elastic.c
     [Back to top](#top)
    +### HDFS
    +This connector provides a Sink that writes rolling files to HDFS. To use this connector,
add the
    +following dependency to your project:
    +{% highlight xml %}
    +  <groupId>org.apache.flink</groupId>
    +  <artifactId>flink-connector-hdfs</artifactId>
    +  <version>{{site.version}}</version>
    +{% endhighlight %}
    +Note that the streaming connectors are currently not part of the binary
    +distribution. See
    +for information about how to package the program with the libraries for
    +cluster execution.
    +#### HDFS Rolling File Sink
    +The rolling behaviour as well as the writing can be configured but we will get to that
    +This is how you can create a default rolling sink:
    +<div class="codetabs" markdown="1">
    +<div data-lang="java" markdown="1">
    +{% highlight java %}
    +DataStream<String> input = ...;
    +input.addSink(new RollingHDFSSink<String>("/base/path"));
    +{% endhighlight %}
    +<div data-lang="scala" markdown="1">
    +{% highlight scala %}
    +val input: DataStream[String] = ...
    +input.addSink(new RollingHDFSSink("/base/path"))
    +{% endhighlight %}
    +The only required parameter is the base path in HDFS where the rolling files (buckets)
will be
    +stored. The sink can be configured by specifying a custom bucketer, HDFS writer and batch
    +By default the rolling sink will use the pattern `"yyyy-MM-dd--HH"` to name the rolling
    --- End diff --
    Can you make it a bit more explicit that a new directory is created when the pattern changes?

> Add Stream Sink For Rolling HDFS Files
> --------------------------------------
>                 Key: FLINK-2583
>                 URL: https://issues.apache.org/jira/browse/FLINK-2583
>             Project: Flink
>          Issue Type: New Feature
>          Components: Streaming
>            Reporter: Aljoscha Krettek
>            Assignee: Aljoscha Krettek
>             Fix For: 0.10
> In addition to having configurable file-rolling behavior the Sink should also integrate
with checkpointing to make it possible to have exactly-once semantics throughout the topology.

This message was sent by Atlassian JIRA

View raw message