flink-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From u..@apache.org
Subject [01/30] flink git commit: [docs] Change doc layout
Date Wed, 22 Apr 2015 14:16:58 GMT
Repository: flink
Updated Branches:
  refs/heads/master 6df1dd2cc -> f1ee90ccb


http://git-wip-us.apache.org/repos/asf/flink/blob/f1ee90cc/docs/web_client.md
----------------------------------------------------------------------
diff --git a/docs/web_client.md b/docs/web_client.md
deleted file mode 100644
index 34d7274..0000000
--- a/docs/web_client.md
+++ /dev/null
@@ -1,74 +0,0 @@
----
-title:  "Web Client"
----
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements.  See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership.  The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License.  You may obtain a copy of the License at
-
-  http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied.  See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-
-* This will be replaced by the TOC
-{:toc}
-
-Flink provides a web interface to upload jobs, inspect their execution plans, and execute
them. The interface is a great tool to showcase programs, debug execution plans, or demonstrate
the system as a whole.
-
-## Starting, Stopping, and Configuring the Web Interface
-
-Start the web interface by executing:
-
-    ./bin/start-webclient.sh
-
-and stop it by calling:
-
-    ./bin/stop-webclient.sh
-
-The web interface runs on port 8080 by default. To specify a custom port set the ```webclient.port```
property in the *./conf/flink.yaml* configuration file. Jobs are submitted to the JobManager
specified by ```jobmanager.rpc.address``` and ```jobmanager.rpc.port```. Please consult the
[configuration](config.html#webclient) page for details and further configuration options.
-
-## Using the Web Interface
-
-The web interface provides two views:
-
-1.  The **job view** to upload, preview, and submit Flink programs.
-2.  The **plan view** to analyze the optimized execution plans of Flink programs.
-
-### Job View
-
-The interface starts serving the job view. 
-
-You can **upload** a Flink program as a jar file. To **execute** an uploaded program:
-
-* select it from the job list on the left, 
-* enter the program arguments in the *"Arguments"* field (bottom left), and 
-* click on the *"Run Job"* button (bottom right).
-
-If the *“Show optimizer plan”* option is enabled (default), the *plan view* is display
next, otherwise the job is directly submitted to the JobManager for execution.
-
-In case the jar's manifest file does not specify the program class, you can specify it before
the argument list as:
-
-```
-assembler <assemblerClass> <programArgs...>
-```
-
-### Plan View
-
-The plan view shows the optimized execution plan of the submitted program in the upper half
of the page. The bottom part of the page displays detailed information about the currently
selected plan operator including:
-
-* the chosen shipping strategies (local forward, hash partition, range partition, broadcast,
...),
-* the chosen local strategy (sort, hash join, merge join, ...),
-* inferred data properties (partitioning, grouping, sorting), and 
-* used optimizer estimates (data size, I/O and network costs, ...).
-
-To submit the job for execution, click again on the *"Run Job"* button in the bottom right.

http://git-wip-us.apache.org/repos/asf/flink/blob/f1ee90cc/docs/yarn_setup.md
----------------------------------------------------------------------
diff --git a/docs/yarn_setup.md b/docs/yarn_setup.md
deleted file mode 100644
index 7daf573..0000000
--- a/docs/yarn_setup.md
+++ /dev/null
@@ -1,264 +0,0 @@
----
-title:  "YARN Setup"
----
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements.  See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership.  The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License.  You may obtain a copy of the License at
-
-  http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied.  See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-
-* This will be replaced by the TOC
-{:toc}
-
-## Quickstart: Start a long-running Flink cluster on YARN
-
-Start a YARN session with 4 Task Managers (each with 4 GB of Heapspace):
-
-~~~bash
-wget {{ site.FLINK_WGET_URL_YARN_THIS }}
-tar xvzf flink-{{ site.FLINK_VERSION_THIS_HADOOP2 }}-bin-hadoop2.tgz
-cd flink-{{ site.FLINK_VERSION_THIS_HADOOP2 }}/
-./bin/yarn-session.sh -n 4 -jm 1024 -tm 4096
-~~~
-
-Specify the `-s` flag for the number of processing slots per Task Manager. We recommend to
set the number of slots to the number of processors per machine.
-
-Once the session has been started, you can submit jobs to the cluster using the `./bin/flink`
tool.
-
-## Quickstart: Run a Flink job on YARN
-
-~~~bash
-wget {{ site.FLINK_WGET_URL_YARN_THIS }}
-tar xvzf flink-{{ site.FLINK_VERSION_THIS_HADOOP2 }}-bin-hadoop2.tgz
-cd flink-{{ site.FLINK_VERSION_THIS_HADOOP2 }}/
-./bin/flink -m yarn-cluster -yn 4 -yjm 1024 -ytm 4096 ./examples/flink-java-examples-{{ site.FLINK_VERSION_THIS_HADOOP2
}}-WordCount.jar
-~~~
-
-## Apache Flink on Hadoop YARN using a YARN Session
-
-Apache [Hadoop YARN](http://hadoop.apache.org/) is a cluster resource management framework.
It allows to run various distributed applications on top of a cluster. Flink runs on YARN
next to other applications. Users do not have to setup or install anything if there is already
a YARN setup.
-
-**Requirements**
-
-- Apache Hadoop 2.2
-- HDFS (Hadoop Distributed File System) (or another distributed file system supported by
Hadoop)
-
-If you have troubles using the Flink YARN client, have a look in the [FAQ section](faq.html).
-
-### Start Flink Session
-
-Follow these instructions to learn how to launch a Flink Session within your YARN cluster.
-
-A session will start all required Flink services (JobManager and TaskManagers) so that you
can submit programs to the cluster. Note that you can run multiple programs per session.
-
-#### Download Flink for YARN
-
-Download the YARN tgz package on the [download page]({{site.baseurl}}/downloads.html). It
contains the required files.
-
-Extract the package using:
-
-~~~bash
-tar xvzf flink-{{ site.FLINK_VERSION_THIS_HADOOP2 }}-bin-hadoop2.tgz
-cd flink-{{site.FLINK_VERSION_THIS_HADOOP2 }}/
-~~~
-
-If you want to build the YARN .tgz file from sources, follow the [build instructions](building.html).
You can find the result of the build in `flink-dist/target/flink-{{ site.FLINK_VERSION_THIS_HADOOP2
}}-bin/flink-{{ site.FLINK_VERSION_THIS_HADOOP2 }}/` (*Note: The version might be different
for you* ).
-
-
-#### Start a Session
-
-Use the following command to start a session
-
-~~~bash
-./bin/yarn-session.sh
-~~~
-
-This command will show you the following overview:
-
-~~~bash
-Usage:
-   Required
-     -n,--container <arg>   Number of YARN container to allocate (=Number of Task Managers)
-   Optional
-     -D <arg>                        Dynamic properties
-     -d,--detached                   Start detached
-     -jm,--jobManagerMemory <arg>    Memory for JobManager Container [in MB]
-     -q,--query                      Display available YARN resources (memory, cores)
-     -qu,--queue <arg>               Specify YARN queue.
-     -s,--slots <arg>                Number of slots per TaskManager
-     -tm,--taskManagerMemory <arg>   Memory per TaskManager Container [in MB]
-~~~
-
-Please note that the Client requires the `YARN_CONF_DIR` or `HADOOP_CONF_DIR` environment
variable to be set to read the YARN and HDFS configuration.
-
-**Example:** Issue the following command to allocate 10 Task Managers, with 8 GB of memory
and 32 processing slots each:
-
-~~~bash
-./bin/yarn-session.sh -n 10 -tm 8192 -s 32
-~~~
-
-The system will use the configuration in `conf/flink-config.yaml`. Please follow our [configuration
guide](config.html) if you want to change something. 
-
-Flink on YARN will overwrite the following configuration parameters `jobmanager.rpc.address`
(because the JobManager is always allocated at different machines), `taskmanager.tmp.dirs`
(we are using the tmp directories given by YARN) and `parallelism.default` if the number of
slots has been specified.
-
-If you don't want to change the configuration file to set configuration parameters, there
is the option to pass dynamic properties via the `-D` flag. So you can pass parameters this
way: `-Dfs.overwrite-files=true -Dtaskmanager.network.numberOfBuffers=16368`.
-
-The example invocation starts 11 containers, since there is one additional container for
the ApplicationMaster and Job Manager.
-
-Once Flink is deployed in your YARN cluster, it will show you the connection details of the
Job Manager.
-
-Stop the YARN session by stopping the unix process (using CTRL+C) or by entering 'stop' into
the client.
-
-#### Detached YARN session
-
-If you do not want to keep the Flink YARN client running all the time, its also possible
to start a *detached* YARN session. 
-The parameter for that is called `-d` or `--detached`.
-
-In that case, the Flink YARN client will only submit Flink to the cluster and then close
itself.
-Note that in this case its not possible to stop the YARN session using Flink.
-
-Use the YARN utilities (`yarn application -kill <appId`) to stop the YARN session.
-
-
-### Submit Job to Flink
-
-Use the following command to submit a Flink program to the YARN cluster:
-
-~~~bash
-./bin/flink
-~~~
-
-Please refer to the documentation of the [commandline client](cli.html).
-
-The command will show you a help menu like this:
-
-~~~bash
-[...]
-Action "run" compiles and runs a program.
-
-  Syntax: run [OPTIONS] <jar-file> <arguments>
-  "run" action arguments:
-     -c,--class <classname>           Class with the program entry point ("main"
-                                      method or "getPlan()" method. Only needed
-                                      if the JAR file does not specify the class
-                                      in its manifest.
-     -m,--jobmanager <host:port>      Address of the JobManager (master) to
-                                      which to connect. Use this flag to connect
-                                      to a different JobManager than the one
-                                      specified in the configuration.
-     -p,--parallelism <parallelism>   The parallelism with which to run the
-                                      program. Optional flag to override the
-                                      default value specified in the
-                                      configuration
-~~~
-
-Use the *run* action to submit a job to YARN. The client is able to determine the address
of the JobManager. In the rare event of a problem, you can also pass the JobManager address
using the `-m` argument. The JobManager address is visible in the YARN console.
-
-**Example**
-
-~~~bash
-wget -O apache-license-v2.txt http://www.apache.org/licenses/LICENSE-2.0.txt
-hadoop fs -copyFromLocal LICENSE-2.0.txt hdfs:/// ...
-./bin/flink run ./examples/flink-java-examples-{{site.FLINK_VERSION_THIS_HADOOP2 }}-WordCount.jar
\
-        hdfs:///..../apache-license-v2.txt hdfs:///.../wordcount-result.txt
-~~~
-
-If there is the following error, make sure that all TaskManagers started:
-
-~~~bash
-Exception in thread "main" org.apache.flink.compiler.CompilerException:
-    Available instances could not be determined from job manager: Connection timed out.
-~~~
-
-You can check the number of TaskManagers in the JobManager web interface. The address of
this interface is printed in the YARN session console.
-
-If the TaskManagers do not show up after a minute, you should investigate the issue using
the log files.
-
-
-## Run a single Flink job on Hadoop YARN
-
-The documentation above describes how to start a Flink cluster within a Hadoop YARN environment.
-It is also possible to launch Flink within YARN only for executing a single job.
-
-Please note that the client then expects the `-yn` value to be set (number of TaskManagers).
-
-***Example:***
-
-~~~bash
-./bin/flink run -m yarn-cluster -yn 2 ./examples/flink-java-examples-{{site.FLINK_VERSION_THIS_HADOOP2
}}-WordCount.jar 
-~~~
-
-The command line options of the YARN session are also available with the `./bin/flink` tool.
They are prefixed with a `y` or `yarn` (for the long argument options).
-
-
-## Recovery behavior of Flink on YARN
-
-Flink's YARN client has the following configuration parameters to control how to behave in
case of container failures. These parameters can be set either from the `conf/flink-conf.yaml`
or when starting the YARN session, using `-D` parameters.
-
-- `yarn.reallocate-failed`: This parameter controls whether Flink should reallocate failed
TaskManager containers. Default: true
-- `yarn.maximum-failed-containers`: The maximum number of failed containers the ApplicationMaster
accepts until it fails the YARN session. Default: The number of initally requested TaskManagers
(`-n`).
-- `yarn.application-attempts`: The number of ApplicationMaster (+ its TaskManager containers)
attempts. If this value is set to 1 (default), the entire YARN session will fail when the
Application master fails. Higher values specify the number of restarts of the ApplicationMaster
by YARN.
-
-
-## Debugging a failed YARN session
-
-There are many reasons why a Flink YARN session deployment can fail. A misconfigured Hadoop
setup (HDFS permissions, YARN configuration), version incompatibilities (running Flink with
vanilla Hadoop dependencies on Cloudera Hadoop) or other errors.
-
-### Log Files
-
-In cases where the Flink YARN session fails during the deployment itself, users have to rely
on the logging capabilities of Hadoop YARN. The most useful feature for that is the [YARN
log aggregation](http://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/).

-To enable it, users have to set the `yarn.log-aggregation-enable` property to `true` in the
`yarn-site.xml` file.
-Once that is enabled, users can use the following command to retrieve all log files of a
(failed) YARN session.
-
-~~~
-yarn logs -applicationId <application ID>
-~~~
-
-Note that it takes a few seconds after the session has finished until the logs show up.
-
-### YARN Client console & Webinterfaces
-
-The Flink YARN client also prints error messages in the terminal if errors occur during runtime
(for example if a TaskManager stops working after some time).
-
-In addition to that, there is the YARN Resource Manager webinterface (by default on port
8088). The port of the Resource Manager web interface is determined by the `yarn.resourcemanager.webapp.address`
configuration value. 
-
-It allows to access log files for running YARN applications and shows diagnostics for failed
apps.
-
-
-## Build YARN client for a specific Hadoop version
-
-Users using Hadoop distributions from companies like Hortonworks, Cloudera or MapR might
have to build Flink against their specific versions of Hadoop (HDFS) and YARN. Please read
the [build instructions](building.html) for more details.
-
-
-## Background / Internals
-
-This section briefly describes how Flink and YARN interact. 
-
-<img src="img/FlinkOnYarn.svg" class="img-responsive">
-
-The YARN client needs to access the Hadoop configuration to connect to the YARN resource
manager and to HDFS. It determines the Hadoop configuration using the following strategy:
-
-* Test if `YARN_CONF_DIR`, `HADOOP_CONF_DIR` or `HADOOP_CONF_PATH` are set (in that order).
If one of these variables are set, they are used to read the configuration.
-* If the above strategy fails (this should not be the case in a correct YARN setup), the
client is using the `HADOOP_HOME` environment variable. If it is set, the client tries to
access `$HADOOP_HOME/etc/hadoop` (Hadoop 2) and `$HADOOP_HOME/conf` (Hadoop 1).
-
-When starting a new Flink YARN session, the client first checks if the requested resources
(containers and memory) are available. After that, it uploads a jar that contains Flink and
the configuration to HDFS (step 1).
-
-The next step of the client is to request (step 2) a YARN container to start the *ApplicationMaster*
(step 3). Since the client registered the configuration and jar-file as a resource for the
container, the NodeManager of YARN running on that particular machine will take care of preparing
the container (e.g. downloading the files). Once that has finished, the *ApplicationMaster*
(AM) is started.
-
-The *JobManager* and AM are running in the same container. Once they successfully started,
the AM knows the address of the JobManager (its own host). It is generating a new Flink configuration
file for the TaskManagers (so that they can connect to the JobManager). The file is also uploaded
to HDFS. Additionally, the *AM* container is also serving Flink's web interface. The ports
Flink is using for its services are the standard ports configured by the user + the application
id as an offset. This allows users to execute multiple Flink YARN sessions in parallel.
-
-After that, the AM starts allocating the containers for Flink's TaskManagers, which will
download the jar file and the modified configuration from the HDFS. Once these steps are completed,
Flink is set up and ready to accept Jobs.
-


Mime
View raw message