flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rmetzger <...@git.apache.org>
Subject [GitHub] flink pull request: [FLINK-2583] Add Stream Sink For Rolling HDFS ...
Date Thu, 03 Sep 2015 09:29:19 GMT
Github user rmetzger commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1084#discussion_r38627730
  
    --- Diff: docs/apis/streaming_guide.md ---
    @@ -1836,6 +1837,110 @@ More about information about Elasticsearch can be found [here](https://elastic.c
     
     [Back to top](#top)
     
    +### HDFS
    +
    +This connector provides a Sink that writes rolling files to HDFS. To use this connector,
add the
    +following dependency to your project:
    +
    +{% highlight xml %}
    +<dependency>
    +  <groupId>org.apache.flink</groupId>
    +  <artifactId>flink-connector-hdfs</artifactId>
    +  <version>{{site.version}}</version>
    +</dependency>
    +{% endhighlight %}
    +
    +Note that the streaming connectors are currently not part of the binary
    +distribution. See
    +[here](cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution)
    +for information about how to package the program with the libraries for
    +cluster execution.
    +
    +#### HDFS Rolling File Sink
    +
    +The rolling behaviour as well as the writing can be configured but we will get to that
later.
    +This is how you can create a default rolling sink:
    +
    +<div class="codetabs" markdown="1">
    +<div data-lang="java" markdown="1">
    +{% highlight java %}
    +DataStream<String> input = ...;
    +
    +input.addSink(new RollingHDFSSink<String>("/base/path"));
    +
    +{% endhighlight %}
    +</div>
    +<div data-lang="scala" markdown="1">
    +{% highlight scala %}
    +val input: DataStream[String] = ...
    +
    +input.addSink(new RollingHDFSSink("/base/path"))
    +
    +{% endhighlight %}
    +</div>
    +</div>
    +
    +The only required parameter is the base path in HDFS where the rolling files (buckets)
will be
    +stored. The sink can be configured by specifying a custom bucketer, HDFS writer and batch
size.
    +
    +By default the rolling sink will use the pattern `"yyyy-MM-dd--HH"` to name the rolling
buckets.
    --- End diff --
    
    Can you make it a bit more explicit that a new directory is created when the pattern changes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message