flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tillrohrmann <...@git.apache.org>
Subject [GitHub] flink pull request: [FLINK-2209] Document linking with jars not in...
Date Mon, 15 Jun 2015 09:32:42 GMT
Github user tillrohrmann commented on a diff in the pull request:

    --- Diff: docs/apis/cluster_execution.md ---
    @@ -80,67 +80,73 @@ Note that the program contains custom user code and hence requires
a JAR file wi
     the classes of the code attached. The constructor of the remote environment
     takes the path(s) to the JAR file(s).
    -## Remote Executor
    +## Linking with modules not contained in the binary distribution
    -Similar to the RemoteEnvironment, the RemoteExecutor lets you execute
    -Flink programs on a cluster directly. The remote executor accepts a
    -*Plan* object, which describes the program as a single executable unit.
    +The binary distribution contains jar packages in the `lib` folder that are automatically
    +provided to the classpath of your distrbuted programs. Almost all of Flink classes are
    +located there with a few exceptions, for example the streaming connectors and some freshly
    +added modules. To run code depending on these modules you need to make them accessible
    +during runtime, for which we suggest two options:
    -### Maven Dependency
    -If you are developing your program in a Maven project, you have to add the
    -`flink-clients` module using this dependency:
    -  <groupId>org.apache.flink</groupId>
    -  <artifactId>flink-clients</artifactId>
    -  <version>{{ site.version }}</version>
    -### Example
    -The following illustrates the use of the `RemoteExecutor` with the Scala API:
    -def main(args: Array[String]) {
    -    val input = TextFile("hdfs://path/to/file")
    +1. Either copy the required jar files to the `lib` folder onto all of your TaskManagers.
    +2. Or package them with your usercode.
    -    val words = input flatMap { _.toLowerCase().split("""\W+""") filter { _ != "" } }
    -    val counts = words groupBy { x => x } count()
    +The latter version is recommended as it respects the classloader management in Flink.
    -    val output = counts.write(wordsOutput, CsvOutputFormat())
    -    val plan = new ScalaPlan(Seq(output), "Word Count")
    -    val executor = new RemoteExecutor("strato-master", 7881, "/path/to/jarfile.jar")
    -    executor.executePlan(p);
    +### Packaging dependencies with your usercode with Maven
    -The following illustrates the use of the `RemoteExecutor` with the Java API (as
    -an alternative to the RemoteEnvironment):
    +To provide these dependencies not included by Flink we suggest two options with Maven.
    -public static void main(String[] args) throws Exception {
    -    ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    +1. The maven assembly plugin builds a so called fat jar cointaining all your dependencies.
    +Easy to configure, but is an overkill in many cases. See 
    +2. The maven unpack plugin, for unpacking the relevant parts of the dependencies and
    +then package it with your code.
    -    DataSet<String> data = env.readTextFile("hdfs://path/to/file");
    +To the the latter for example for the streaming Kafka connector, `flink-connector-kafka`
    --- End diff --
    Wording of the first sentence. Maybe something like: "Using the latter approach in order
to bundle the Kafka connecter..."

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message