flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-2209) Document how to use TableAPI, Gelly and FlinkML, StreamingConnectors on a cluster
Date Mon, 15 Jun 2015 09:33:01 GMT

    [ https://issues.apache.org/jira/browse/FLINK-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585704#comment-14585704
] 

ASF GitHub Bot commented on FLINK-2209:
---------------------------------------

Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/835#discussion_r32404022
  
    --- Diff: docs/apis/cluster_execution.md ---
    @@ -80,67 +80,73 @@ Note that the program contains custom user code and hence requires
a JAR file wi
     the classes of the code attached. The constructor of the remote environment
     takes the path(s) to the JAR file(s).
     
    -## Remote Executor
    +## Linking with modules not contained in the binary distribution
     
    -Similar to the RemoteEnvironment, the RemoteExecutor lets you execute
    -Flink programs on a cluster directly. The remote executor accepts a
    -*Plan* object, which describes the program as a single executable unit.
    +The binary distribution contains jar packages in the `lib` folder that are automatically
    +provided to the classpath of your distrbuted programs. Almost all of Flink classes are
    +located there with a few exceptions, for example the streaming connectors and some freshly
    +added modules. To run code depending on these modules you need to make them accessible
    +during runtime, for which we suggest two options:
     
    -### Maven Dependency
    -
    -If you are developing your program in a Maven project, you have to add the
    -`flink-clients` module using this dependency:
    -
    -~~~xml
    -<dependency>
    -  <groupId>org.apache.flink</groupId>
    -  <artifactId>flink-clients</artifactId>
    -  <version>{{ site.version }}</version>
    -</dependency>
    -~~~
    -
    -### Example
    -
    -The following illustrates the use of the `RemoteExecutor` with the Scala API:
    -
    -~~~scala
    -def main(args: Array[String]) {
    -    val input = TextFile("hdfs://path/to/file")
    +1. Either copy the required jar files to the `lib` folder onto all of your TaskManagers.
    +2. Or package them with your usercode.
     
    -    val words = input flatMap { _.toLowerCase().split("""\W+""") filter { _ != "" } }
    -    val counts = words groupBy { x => x } count()
    +The latter version is recommended as it respects the classloader management in Flink.
     
    -    val output = counts.write(wordsOutput, CsvOutputFormat())
    -  
    -    val plan = new ScalaPlan(Seq(output), "Word Count")
    -    val executor = new RemoteExecutor("strato-master", 7881, "/path/to/jarfile.jar")
    -    executor.executePlan(p);
    -}
    -~~~
    +### Packaging dependencies with your usercode with Maven
     
    -The following illustrates the use of the `RemoteExecutor` with the Java API (as
    -an alternative to the RemoteEnvironment):
    +To provide these dependencies not included by Flink we suggest two options with Maven.
     
    -~~~java
    -public static void main(String[] args) throws Exception {
    -    ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    +1. The maven assembly plugin builds a so called fat jar cointaining all your dependencies.
    +Easy to configure, but is an overkill in many cases. See 
    +[usage](http://maven.apache.org/plugins/maven-assembly-plugin/usage.html).
    +2. The maven unpack plugin, for unpacking the relevant parts of the dependencies and
    +then package it with your code.
     
    -    DataSet<String> data = env.readTextFile("hdfs://path/to/file");
    +To the the latter for example for the streaming Kafka connector, `flink-connector-kafka`
    --- End diff --
    
    Wording of the first sentence. Maybe something like: "Using the latter approach in order
to bundle the Kafka connecter..."


> Document how to use TableAPI, Gelly and FlinkML, StreamingConnectors on a cluster
> ---------------------------------------------------------------------------------
>
>                 Key: FLINK-2209
>                 URL: https://issues.apache.org/jira/browse/FLINK-2209
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: Till Rohrmann
>            Assignee: Márton Balassi
>
> Currently the TableAPI, Gelly, FlinkML and StreamingConnectors are not part of the Flink
dist module. Therefore they are not included in the binary distribution. As a consequence,
if you want to use one of these libraries the corresponding jar and all their dependencies
have to be either manually put on the cluster or the user has to include them in the user
code jar.
> Usually a fat jar is built if the one uses the quickstart archetypes. However if one
sets the project manually up this ist not necessarily the case. Therefore, it should be well
documented how to run programs using one of these libraries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message