flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-2583) Add Stream Sink For Rolling HDFS Files
Date Thu, 03 Sep 2015 09:29:45 GMT

    [ https://issues.apache.org/jira/browse/FLINK-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728745#comment-14728745
] 

ASF GitHub Bot commented on FLINK-2583:
---------------------------------------

Github user rmetzger commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1084#discussion_r38627730
  
    --- Diff: docs/apis/streaming_guide.md ---
    @@ -1836,6 +1837,110 @@ More about information about Elasticsearch can be found [here](https://elastic.c
     
     [Back to top](#top)
     
    +### HDFS
    +
    +This connector provides a Sink that writes rolling files to HDFS. To use this connector,
add the
    +following dependency to your project:
    +
    +{% highlight xml %}
    +<dependency>
    +  <groupId>org.apache.flink</groupId>
    +  <artifactId>flink-connector-hdfs</artifactId>
    +  <version>{{site.version}}</version>
    +</dependency>
    +{% endhighlight %}
    +
    +Note that the streaming connectors are currently not part of the binary
    +distribution. See
    +[here](cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution)
    +for information about how to package the program with the libraries for
    +cluster execution.
    +
    +#### HDFS Rolling File Sink
    +
    +The rolling behaviour as well as the writing can be configured but we will get to that
later.
    +This is how you can create a default rolling sink:
    +
    +<div class="codetabs" markdown="1">
    +<div data-lang="java" markdown="1">
    +{% highlight java %}
    +DataStream<String> input = ...;
    +
    +input.addSink(new RollingHDFSSink<String>("/base/path"));
    +
    +{% endhighlight %}
    +</div>
    +<div data-lang="scala" markdown="1">
    +{% highlight scala %}
    +val input: DataStream[String] = ...
    +
    +input.addSink(new RollingHDFSSink("/base/path"))
    +
    +{% endhighlight %}
    +</div>
    +</div>
    +
    +The only required parameter is the base path in HDFS where the rolling files (buckets)
will be
    +stored. The sink can be configured by specifying a custom bucketer, HDFS writer and batch
size.
    +
    +By default the rolling sink will use the pattern `"yyyy-MM-dd--HH"` to name the rolling
buckets.
    --- End diff --
    
    Can you make it a bit more explicit that a new directory is created when the pattern changes?


> Add Stream Sink For Rolling HDFS Files
> --------------------------------------
>
>                 Key: FLINK-2583
>                 URL: https://issues.apache.org/jira/browse/FLINK-2583
>             Project: Flink
>          Issue Type: New Feature
>          Components: Streaming
>            Reporter: Aljoscha Krettek
>            Assignee: Aljoscha Krettek
>             Fix For: 0.10
>
>
> In addition to having configurable file-rolling behavior the Sink should also integrate
with checkpointing to make it possible to have exactly-once semantics throughout the topology.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message