Return-Path: X-Original-To: apmail-flink-commits-archive@minotaur.apache.org Delivered-To: apmail-flink-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B15941000E for ; Mon, 9 Feb 2015 12:40:47 +0000 (UTC) Received: (qmail 88858 invoked by uid 500); 9 Feb 2015 12:40:47 -0000 Delivered-To: apmail-flink-commits-archive@flink.apache.org Received: (qmail 88810 invoked by uid 500); 9 Feb 2015 12:40:47 -0000 Mailing-List: contact commits-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list commits@flink.apache.org Received: (qmail 88757 invoked by uid 99); 9 Feb 2015 12:40:47 -0000 Received: from eris.apache.org (HELO hades.apache.org) (140.211.11.105) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Feb 2015 12:40:47 +0000 Received: from hades.apache.org (localhost [127.0.0.1]) by hades.apache.org (ASF Mail Server at hades.apache.org) with ESMTP id 4EB65AC013F for ; Mon, 9 Feb 2015 12:40:47 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: svn commit: r1658387 [2/8] - in /flink/site: ./ docs/ docs/0.6-incubating/ docs/0.6-incubating/api/java/ docs/0.6-incubating/api/java/org/apache/flink/spargel/java/record/ docs/0.6-incubating/css/ docs/0.6-incubating/img/ docs/0.6-incubating/img/quicks... Date: Mon, 09 Feb 2015 12:40:45 -0000 To: commits@flink.apache.org From: mbalassi@apache.org X-Mailer: svnmailer-1.0.9 Message-Id: <20150209124047.4EB65AC013F@hades.apache.org> Modified: flink/site/docs/0.6-incubating/cluster_setup.html URL: http://svn.apache.org/viewvc/flink/site/docs/0.6-incubating/cluster_setup.html?rev=1658387&r1=1658386&r2=1658387&view=diff ============================================================================== --- flink/site/docs/0.6-incubating/cluster_setup.html (original) +++ flink/site/docs/0.6-incubating/cluster_setup.html Mon Feb 9 12:40:44 2015 @@ -5,153 +5,110 @@ Apache Flink (incubating): Cluster Setup - - - - - - - + + + + + + - + - -
+

Cluster Setup

- - -

This documentation is intended to provide instructions on how to run +

This documentation is intended to provide instructions on how to run Flink in a fully distributed fashion on a static (but possibly heterogeneous) cluster.

@@ -159,9 +116,9 @@ heterogeneous) cluster.

second installing and configuring the Hadoop Distributed Filesystem (HDFS).

-

Preparing the Cluster

+

Preparing the Cluster

-

Software Requirements

+

Software Requirements

Flink runs on all UNIX-like environments, e.g. Linux, Mac OS X, and Cygwin (for Windows) and expects the cluster to consist of one master @@ -169,8 +126,8 @@ node and one or more wo make sure you have the following software installed on each node:

    -
  • Java 1.6.x or higher,
  • -
  • ssh (sshd must be running to use the Flink scripts that manage +
  • Java 1.6.x or higher,
  • +
  • ssh (sshd must be running to use the Flink scripts that manage remote components)
@@ -179,34 +136,29 @@ install/upgrade it.

For example, on Ubuntu Linux, type in the following commands to install Java and ssh:

- -
sudo apt-get install ssh 
-sudo apt-get install openjdk-7-jre
- +
sudo apt-get install ssh 
+sudo apt-get install openjdk-7-jre
+

You can check the correct installation of Java by issuing the following command:

- -
java -version
- +
java -version
+

The command should output something comparable to the following on every node of your cluster (depending on your Java version, there may be small differences):

- -
java version "1.6.0_22"
-Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
-Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode)
- +
java version "1.6.0_22"
+Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
+Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode)
+

To make sure the ssh daemon is running properly, you can use the command

- -
ps aux | grep sshd
- +
ps aux | grep sshd
+

Something comparable to the following line should appear in the output of the command on every host of your cluster:

- -
root       894  0.0  0.0  49260   320 ?        Ss   Jan09   0:13 /usr/sbin/sshd
- -

Configuring Remote Access with ssh

+
root       894  0.0  0.0  49260   320 ?        Ss   Jan09   0:13 /usr/sbin/sshd
+
+

Configuring Remote Access with ssh

In order to start/stop the remote processes, the master node requires access via -ssh to the worker nodes. It is most convenient to use ssh’s public key +ssh to the worker nodes. It is most convenient to use ssh's public key authentication for this. To setup public key authentication, log on to the master as the user who will later execute all the Flink components. The same user (i.e. a user with the same user name) must also exist on all worker @@ -219,16 +171,14 @@ new public/private key pair. The followi public/private key pair into the .ssh directory inside the home directory of the user flink. See the ssh-keygen man page for more details. Note that the private key is not protected by a passphrase.

- -
ssh-keygen -b 2048 -P '' -f ~/.ssh/id_rsa
- +
ssh-keygen -b 2048 -P '' -f ~/.ssh/id_rsa
+

Next, copy/append the content of the file .ssh/id_rsa.pub to your authorized_keys file. The content of the authorized_keys file defines which public keys are considered trustworthy during the public key authentication process. On most systems the appropriate command is

- -
cat .ssh/id_rsa.pub >> .ssh/authorized_keys
- +
cat .ssh/id_rsa.pub >> .ssh/authorized_keys
+

On some Linux systems, the authorized keys file may also be expected by the ssh daemon under .ssh/authorized_keys2. In either case, you should make sure the file only contains those public keys which you consider trustworthy for each @@ -236,14 +186,13 @@ node of cluster.

Finally, the authorized keys file must be copied to every worker node of your cluster. You can do this by repeatedly typing in

- -
scp .ssh/authorized_keys <worker>:~/.ssh/
- +
scp .ssh/authorized_keys <worker>:~/.ssh/
+

and replacing <worker> with the host name of the respective worker node. After having finished the copy process, you should be able to log on to each worker node from your master node via ssh without a password.

-

Setting JAVA_HOME on each Node

+

Setting JAVA_HOME on each Node

Flink requires the JAVA_HOME environment variable to be set on the master and all worker nodes and point to the directory of your Java @@ -254,18 +203,16 @@ installation.

Alternatively, add the following line to your shell profile. If you use the bash shell (probably the most common shell), the shell profile is located in -\~/.bashrc:

- -
export JAVA_HOME=/path/to/java_home/
- +~/.bashrc:

+
export JAVA_HOME=/path/to/java_home/
+

If your ssh daemon supports user environments, you can also add JAVA_HOME to -.\~/.ssh/environment. As super user root you can enable ssh user +.~/.ssh/environment. As super user root you can enable ssh user environments with the following commands:

- -
echo "PermitUserEnvironment yes" >> /etc/ssh/sshd_config
-/etc/init.d/ssh restart
- -

Hadoop Distributed Filesystem (HDFS) Setup

+
echo "PermitUserEnvironment yes" >> /etc/ssh/sshd_config
+/etc/init.d/ssh restart
+
+

Hadoop Distributed Filesystem (HDFS) Setup

The Flink system currently uses the Hadoop Distributed Filesystem (HDFS) to read and write data in a distributed fashion.

@@ -274,15 +221,15 @@ to read and write data in a distributed just a general overview of some required settings. Please consult one of the many installation guides available online for more detailed instructions.

-

**Note that the following instructions are based on Hadoop 1.2 and might differ -**for Hadoop 2.

+

*Note that the following instructions are based on Hadoop 1.2 and might differ +*for Hadoop 2.

-

Downloading, Installing, and Configuring HDFS

+

Downloading, Installing, and Configuring HDFS

Similar to the Flink system HDFS runs in a distributed fashion. HDFS -consists of a NameNode which manages the distributed file system’s meta +consists of a NameNode which manages the distributed file system's meta data. The actual data is stored by one or more DataNodes. For the remainder -of this instruction we assume the HDFS’s NameNode component runs on the master +of this instruction we assume the HDFS's NameNode component runs on the master node while all the worker nodes run an HDFS DataNode.

To start, log on to your master node and download Hadoop (which includes HDFS) @@ -292,15 +239,13 @@ from the Apache

cd hadoop-*
-vi conf/hadoop-env.sh
- +
cd hadoop-*
+vi conf/hadoop-env.sh
+

Uncomment and modify the following line in the file according to the path of your Java installation.

-
export JAVA_HOME=/path/to/java_home/
-
+

export JAVA_HOME=/path/to/java_home/

Save the changes and open the HDFS configuration file conf/hdfs-site.xml. HDFS offers multiple configuration parameters which affect the behavior of the @@ -308,8 +253,7 @@ distributed file system in various ways. configuration which is required to make HDFS work. More information on how to configure HDFS can be found in the HDFS User Guide guide.

- -
<configuration>
+
<configuration>
   <property>
     <name>fs.default.name</name>
     <value>hdfs://MASTER:50040/</value>
@@ -318,8 +262,8 @@ Guide guide.

<name>dfs.data.dir</name> <value>DATAPATH</value> </property> -</configuration>
- +</configuration> +

Replace MASTER with the IP/host name of your master node which runs the NameNode. DATAPATH must be replaced with path to the directory in which the actual HDFS data shall be stored on each worker node. Make sure that the @@ -327,24 +271,21 @@ actual HDFS data shall be stored on each directory.

After having saved the HDFS configuration file, open the file conf/slaves and -enter the IP/host name of those worker nodes which shall act as DataNodes. +enter the IP/host name of those worker nodes which shall act as *DataNode*s. Each entry must be separated by a line break.

- -
<worker 1>
+
<worker 1>
 <worker 2>
 .
 .
 .
 <worker n>
 
-

Initialize the HDFS by typing in the following command. Note that the command will delete all data which has been previously stored in the HDFS. However, since we have just installed a fresh HDFS, it should be safe to answer the confirmation with yes.

- -
bin/hadoop namenode -format
- +
bin/hadoop namenode -format
+

Finally, we need to make sure that the Hadoop directory is available to all worker nodes which are intended to act as DataNodes and that all nodes find the directory under the same path. We recommend to use a shared network @@ -352,34 +293,32 @@ directory (e.g. an NFS share) for that. directory to all nodes (with the disadvantage that all configuration and code updates need to be synced to all nodes).

-

Starting HDFS

+

Starting HDFS

To start the HDFS log on to the master and type in the following commands

- -
cd hadoop-*
-binn/start-dfs.sh
- +
cd hadoop-*
+binn/start-dfs.sh
+

If your HDFS setup is correct, you should be able to open the HDFS -status website at http://MASTER:50070. In a matter of a seconds, +status website at http://MASTER:50070. In a matter of a seconds, all DataNodes should appear as live nodes. For troubleshooting we would like to point you to the Hadoop Quick Start guide.

- +

Flink Setup

-

Go to the downloads page and get the ready to run +

Go to the downloads page and get the ready to run package. Make sure to pick the Flink package matching your Hadoop version.

After downloading the latest release, copy the archive to your master node and extract it:

- -
tar xzf flink-*.tgz
-cd flink-*
- -

Configuring the Cluster

+
tar xzf flink-*.tgz
+cd flink-*
+
+

Configuring the Cluster

After having extracted the system files, you need to configure Flink for the cluster by editing conf/flink-conf.yaml.

@@ -399,20 +338,18 @@ as worker nodes. Therefore, similar to t will later run a TaskManager.

Each entry must be separated by a new line, as in the following example:

- -
192.168.0.100
+
192.168.0.100
 192.168.0.101
 .
 .
 .
 192.168.0.150
 
-

The Flink directory must be available on every worker under the same path. Similarly as for HDFS, you can use a shared NSF directory, or copy the entire Flink directory to every worker node.

-

Configuring the Network Buffers

+

Configuring the Network Buffers

Network buffers are a critical resource for the communication layers. They are used to buffer records before transmission over a network, and to buffer @@ -433,7 +370,7 @@ you expect to be active at the same time

Since the intra-node-parallelism is typically the number of cores, and more than 4 repartitioning or broadcasting channels are rarely active in parallel, it -frequently boils down to #cores\^2\^ * #machines * 4. To support for +frequently boils down to #cores^2^ * #machines * 4. To support for example a cluster of 20 8-core machines, you should use roughly 5000 network buffers for optimal throughput.

@@ -444,20 +381,20 @@ system would allocate roughly 300 MiByte parameters:

    -
  • taskmanager.network.numberOfBuffers, and
  • -
  • taskmanager.network.bufferSizeInBytes.
  • +
  • taskmanager.network.numberOfBuffers, and
  • +
  • taskmanager.network.bufferSizeInBytes.
-

Configuring Temporary I/O Directories

+

Configuring Temporary I/O Directories

Although Flink aims to process as much data in main memory as possible, it is not uncommon that more data needs to be processed than memory is -available. Flink’s runtime is designed to write temporary data to disk +available. Flink's runtime is designed to write temporary data to disk to handle these situations.

The taskmanager.tmp.dirs parameter specifies a list of directories into which Flink writes temporary files. The paths of the directories need to be -separated by ‘:’ (colon character). Flink will concurrently write (or +separated by ':' (colon character). Flink will concurrently write (or read) one temporary file to (from) each configured directory. This way, temporary I/O can be evenly distributed over multiple independent I/O devices such as hard disks to improve performance. To leverage fast I/O devices (e.g., @@ -470,7 +407,7 @@ system, such as /tmp in Linux s

Please see the configuration page for details and additional configuration options.

- +

The following script starts a JobManager on the local node and connects via SSH to all worker nodes listed in the slaves file to start the @@ -479,11 +416,9 @@ running. The JobManager running on the l at the configured RPC port.

Assuming that you are on the master node and inside the Flink directory:

+
bin/start-cluster.sh
+
-
bin/start-cluster.sh
- - -
+ comments powered by Disqus +
Modified: flink/site/docs/0.6-incubating/coding_guidelines.html URL: http://svn.apache.org/viewvc/flink/site/docs/0.6-incubating/coding_guidelines.html?rev=1658387&r1=1658386&r2=1658387&view=diff ============================================================================== --- flink/site/docs/0.6-incubating/coding_guidelines.html (original) +++ flink/site/docs/0.6-incubating/coding_guidelines.html Mon Feb 9 12:40:44 2015 @@ -5,135 +5,112 @@ Apache Flink (incubating): Coding Guidelines - - - - - - - + + + + + + - + - -
+

Coding Guidelines

- +

The coding guidelines are now located on the project website.

- - +
+ comments powered by Disqus +
Modified: flink/site/docs/0.6-incubating/config.html URL: http://svn.apache.org/viewvc/flink/site/docs/0.6-incubating/config.html?rev=1658387&r1=1658386&r2=1658387&view=diff ============================================================================== --- flink/site/docs/0.6-incubating/config.html (original) +++ flink/site/docs/0.6-incubating/config.html Mon Feb 9 12:40:44 2015 @@ -5,148 +5,110 @@ Apache Flink (incubating): Configuration - - - - - - - + + + + + + - + - -
+

Configuration

- - -

Overview

+

Overview

The default configuration parameters allow Flink to run out-of-the-box in single node setups.

@@ -162,125 +124,95 @@ with format key: value.

The system and run scripts parse the config at startup time. Changes to the configuration file require restarting the Flink JobManager and TaskManagers.

-

Common Options

+

Common Options

    -
  • -

    env.java.home: The path to the Java installation to use (DEFAULT: system’s +

  • env.java.home: The path to the Java installation to use (DEFAULT: system's default Java installation, if found). Needs to be specified if the startup scipts fail to automatically resolve the java home directory. Can be specified to point to a specific java installation or version. If this option is not -specified, the startup scripts also evaluate the $JAVA_HOME environment variable.

    -
  • -
  • -

    jobmanager.rpc.address: The IP address of the JobManager, which is the -master/coordinator of the distributed system (DEFAULT: localhost).

    -
  • -
  • -

    jobmanager.rpc.port: The port number of the JobManager (DEFAULT: 6123).

    -
  • -
  • -

    jobmanager.heap.mb: JVM heap size (in megabytes) for the JobManager -(DEFAULT: 256).

    -
  • -
  • -

    taskmanager.heap.mb: JVM heap size (in megabytes) for the TaskManagers, +specified, the startup scripts also evaluate the $JAVA_HOME environment variable.

  • +
  • jobmanager.rpc.address: The IP address of the JobManager, which is the +master/coordinator of the distributed system (DEFAULT: localhost).

  • +
  • jobmanager.rpc.port: The port number of the JobManager (DEFAULT: 6123).

  • +
  • jobmanager.heap.mb: JVM heap size (in megabytes) for the JobManager +(DEFAULT: 256).

  • +
  • taskmanager.heap.mb: JVM heap size (in megabytes) for the TaskManagers, which are the parallel workers of the system. In contrast to Hadoop, Flink runs operators (e.g., join, aggregate) and user-defined functions (e.g., Map, Reduce, CoGroup) inside the TaskManager (including sorting/hashing/caching), so this value should be as large as possible (DEFAULT: 512). On YARN setups, this value is automatically -configured to the size of the TaskManager’s YARN container, minus a -certain tolerance value.

    -
  • -
  • -

    taskmanager.numberOfTaskSlots: The number of parallel operator or +configured to the size of the TaskManager's YARN container, minus a +certain tolerance value.

  • +
  • taskmanager.numberOfTaskSlots: The number of parallel operator or UDF instances that a single TaskManager can run (DEFAULT: 1). If this value is larger than 1, a single TaskManager takes multiple instances of a function or operator. That way, the TaskManager can utilize multiple CPU cores, but at the same time, the available memory is divided between the different operator or function instances. This value is typically proportional to the number of physical CPU cores that -the TaskManager’s machine has (e.g., equal to the number of cores, or half the -number of cores).

    -
  • -
  • -

    parallelization.degree.default: The default degree of parallelism to use for +the TaskManager's machine has (e.g., equal to the number of cores, or half the +number of cores).

  • +
  • parallelization.degree.default: The default degree of parallelism to use for programs that have no degree of parallelism specified. (DEFAULT: 1). For setups that have no concurrent jobs running, setting this value to NumTaskManagers * NumSlotsPerTaskManager will cause the system to use all -available execution resources for the program’s execution.

    -
  • -
  • -

    fs.hdfs.hadoopconf: The absolute path to the Hadoop File System’s (HDFS) +available execution resources for the program's execution.

  • +
  • fs.hdfs.hadoopconf: The absolute path to the Hadoop File System's (HDFS) configuration directory (OPTIONAL VALUE). Specifying this value allows programs to reference HDFS files using short URIs (hdfs:///path/to/files, without including the address and port of the NameNode in the file URI). Without this option, HDFS files can be accessed, but require fully qualified URIs like hdfs://address:port/path/to/files. -This option also causes file writers to pick up the HDFS’s default values for block sizes -and replication factors. Flink will look for the “core-site.xml” and -“hdfs-site.xml” files in teh specified directory.

    -
  • +This option also causes file writers to pick up the HDFS's default values for block sizes +and replication factors. Flink will look for the "core-site.xml" and +"hdfs-site.xml" files in teh specified directory.

-

Advanced Options

+

Advanced Options

    -
  • -

    taskmanager.tmp.dirs: The directory for temporary files, or a list of -directories separated by the systems directory delimiter (for example ‘:’ +

  • taskmanager.tmp.dirs: The directory for temporary files, or a list of +directories separated by the systems directory delimiter (for example ':' (colon) on Linux/Unix). If multiple directories are specified, then the temporary files will be distributed across the directories in a round-robin fashion. The I/O manager component will spawn one reading and one writing thread per directory. A directory may be listed multiple times to have the I/O manager use multiple threads for it (for example if it is physically stored on a very fast -disc or RAID) (DEFAULT: The system’s tmp dir).

    -
  • -
  • -

    jobmanager.web.port: Port of the JobManager’s web interface (DEFAULT: 8081).

    -
  • -
  • -

    fs.overwrite-files: Specifies whether file output writers should overwrite +disc or RAID) (DEFAULT: The system's tmp dir).

  • +
  • jobmanager.web.port: Port of the JobManager's web interface (DEFAULT: 8081).

  • +
  • fs.overwrite-files: Specifies whether file output writers should overwrite existing files by default. Set to true to overwrite by default, false otherwise. -(DEFAULT: false)

    -
  • -
  • -

    fs.output.always-create-directory: File writers running with a parallelism +(DEFAULT: false)

  • +
  • fs.output.always-create-directory: File writers running with a parallelism larger than one create a directory for the output file path and put the different result files (one per parallel writer task) into that directory. If this option is set to true, writers with a parallelism of 1 will also create a directory and place a single result file into it. If the option is set to false, the writer will directly create the file directly at the output path, without -creating a containing directory. (DEFAULT: false)

    -
  • -
  • -

    taskmanager.network.numberOfBuffers: The number of buffers available to the +creating a containing directory. (DEFAULT: false)

  • +
  • taskmanager.network.numberOfBuffers: The number of buffers available to the network stack. This number determines how many streaming data exchange channels a TaskManager can have at the same time and how well buffered the channels are. If a job is rejected or you get a warning that the system has not enough buffers -available, increase this value (DEFAULT: 2048).

    -
  • -
  • -

    taskmanager.memory.size: The amount of memory (in megabytes) that the task -manager reserves on the JVM’s heap space for sorting, hash tables, and caching +available, increase this value (DEFAULT: 2048).

  • +
  • taskmanager.memory.size: The amount of memory (in megabytes) that the task +manager reserves on the JVM's heap space for sorting, hash tables, and caching of intermediate results. If unspecified (-1), the memory manager will take a fixed ratio of the heap memory available to the JVM, as specified by -taskmanager.memory.fraction. (DEFAULT: -1)

    -
  • -
  • -

    taskmanager.memory.fraction: The relative amount of memory that the task +taskmanager.memory.fraction. (DEFAULT: -1)

  • +
  • taskmanager.memory.fraction: The relative amount of memory that the task manager reserves for sorting, hash tables, and caching of intermediate results. For example, a value of 0.8 means that TaskManagers reserve 80% of the -JVM’s heap space for internal data buffers, leaving 20% of the JVM’s heap space +JVM's heap space for internal data buffers, leaving 20% of the JVM's heap space free for objects created by user-defined functions. (DEFAULT: 0.7) -This parameter is only evaluated, if taskmanager.memory.size is not set.

    -
  • +This parameter is only evaluated, if taskmanager.memory.size is not set.

-

Full Reference

+

Full Reference

-

HDFS

+

HDFS

These parameters configure the default HDFS used by Flink. Setups that do not specify a HDFS configuration have to specify the full path to @@ -288,116 +220,116 @@ HDFS files (hdfs://address:port/pa with default HDFS parameters (block size, replication factor).

    -
  • fs.hdfs.hadoopconf: The absolute path to the Hadoop configuration directory. -The system will look for the “core-site.xml” and “hdfs-site.xml” files in that +
  • fs.hdfs.hadoopconf: The absolute path to the Hadoop configuration directory. +The system will look for the "core-site.xml" and "hdfs-site.xml" files in that directory (DEFAULT: null).
  • -
  • fs.hdfs.hdfsdefault: The absolute path of Hadoop’s own configuration file -“hdfs-default.xml” (DEFAULT: null).
  • -
  • fs.hdfs.hdfssite: The absolute path of Hadoop’s own configuration file -“hdfs-site.xml” (DEFAULT: null).
  • +
  • fs.hdfs.hdfsdefault: The absolute path of Hadoop's own configuration file +"hdfs-default.xml" (DEFAULT: null).
  • +
  • fs.hdfs.hdfssite: The absolute path of Hadoop's own configuration file +"hdfs-site.xml" (DEFAULT: null).
-

JobManager & TaskManager

+

JobManager & TaskManager

-

The following parameters configure Flink’s JobManager and TaskManagers.

+

The following parameters configure Flink's JobManager and TaskManagers.

    -
  • jobmanager.rpc.address: The IP address of the JobManager, which is the +
  • jobmanager.rpc.address: The IP address of the JobManager, which is the master/coordinator of the distributed system (DEFAULT: localhost).
  • -
  • jobmanager.rpc.port: The port number of the JobManager (DEFAULT: 6123).
  • -
  • taskmanager.rpc.port: The task manager’s IPC port (DEFAULT: 6122).
  • -
  • taskmanager.data.port: The task manager’s port used for data exchange +
  • jobmanager.rpc.port: The port number of the JobManager (DEFAULT: 6123).
  • +
  • taskmanager.rpc.port: The task manager's IPC port (DEFAULT: 6122).
  • +
  • taskmanager.data.port: The task manager's port used for data exchange operations (DEFAULT: 6121).
  • -
  • jobmanager.heap.mb: JVM heap size (in megabytes) for the JobManager +
  • jobmanager.heap.mb: JVM heap size (in megabytes) for the JobManager (DEFAULT: 256).
  • -
  • taskmanager.heap.mb: JVM heap size (in megabytes) for the TaskManagers, +
  • taskmanager.heap.mb: JVM heap size (in megabytes) for the TaskManagers, which are the parallel workers of the system. In contrast to Hadoop, Flink runs operators (e.g., join, aggregate) and user-defined functions (e.g., Map, Reduce, CoGroup) inside the TaskManager (including sorting/hashing/caching), so this value should be as large as possible (DEFAULT: 512). On YARN setups, this value is automatically -configured to the size of the TaskManager’s YARN container, minus a +configured to the size of the TaskManager's YARN container, minus a certain tolerance value.
  • -
  • taskmanager.numberOfTaskSlots: The number of parallel operator or +
  • taskmanager.numberOfTaskSlots: The number of parallel operator or UDF instances that a single TaskManager can run (DEFAULT: 1). If this value is larger than 1, a single TaskManager takes multiple instances of a function or operator. That way, the TaskManager can utilize multiple CPU cores, but at the same time, the available memory is divided between the different operator or function instances. This value is typically proportional to the number of physical CPU cores that -the TaskManager’s machine has (e.g., equal to the number of cores, or half the +the TaskManager's machine has (e.g., equal to the number of cores, or half the number of cores).
  • -
  • taskmanager.tmp.dirs: The directory for temporary files, or a list of -directories separated by the systems directory delimiter (for example ‘:’ +
  • taskmanager.tmp.dirs: The directory for temporary files, or a list of +directories separated by the systems directory delimiter (for example ':' (colon) on Linux/Unix). If multiple directories are specified, then the temporary files will be distributed across the directories in a round robin fashion. The I/O manager component will spawn one reading and one writing thread per directory. A directory may be listed multiple times to have the I/O manager use multiple threads for it (for example if it is physically stored on a very fast -disc or RAID) (DEFAULT: The system’s tmp dir).
  • -
  • taskmanager.network.numberOfBuffers: The number of buffers available to the +disc or RAID) (DEFAULT: The system's tmp dir).
  • +
  • taskmanager.network.numberOfBuffers: The number of buffers available to the network stack. This number determines how many streaming data exchange channels a TaskManager can have at the same time and how well buffered the channels are. If a job is rejected or you get a warning that the system has not enough buffers available, increase this value (DEFAULT: 2048).
  • -
  • taskmanager.network.bufferSizeInBytes: The size of the network buffers, in +
  • taskmanager.network.bufferSizeInBytes: The size of the network buffers, in bytes (DEFAULT: 32768 (= 32 KiBytes)).
  • -
  • taskmanager.memory.size: The amount of memory (in megabytes) that the task -manager reserves on the JVM’s heap space for sorting, hash tables, and caching +
  • taskmanager.memory.size: The amount of memory (in megabytes) that the task +manager reserves on the JVM's heap space for sorting, hash tables, and caching of intermediate results. If unspecified (-1), the memory manager will take a fixed ratio of the heap memory available to the JVM, as specified by taskmanager.memory.fraction. (DEFAULT: -1)
  • -
  • taskmanager.memory.fraction: The relative amount of memory that the task +
  • taskmanager.memory.fraction: The relative amount of memory that the task manager reserves for sorting, hash tables, and caching of intermediate results. For example, a value of 0.8 means that TaskManagers reserve 80% of the -JVM’s heap space for internal data buffers, leaving 20% of the JVM’s heap space +JVM's heap space for internal data buffers, leaving 20% of the JVM's heap space free for objects created by user-defined functions. (DEFAULT: 0.7) This parameter is only evaluated, if taskmanager.memory.size is not set.
  • -
  • jobclient.polling.interval: The interval (in seconds) in which the client +
  • jobclient.polling.interval: The interval (in seconds) in which the client polls the JobManager for the status of its job (DEFAULT: 2).
  • -
  • taskmanager.runtime.max-fan: The maximal fan-in for external merge joins and +
  • taskmanager.runtime.max-fan: The maximal fan-in for external merge joins and fan-out for spilling hash tables. Limits the number of file handles per operator, but may cause intermediate merging/partitioning, if set too small (DEFAULT: 128).
  • -
  • taskmanager.runtime.sort-spilling-threshold: A sort operation starts spilling +
  • taskmanager.runtime.sort-spilling-threshold: A sort operation starts spilling when this fraction of its memory budget is full (DEFAULT: 0.8).
-

JobManager Web Frontend

+

JobManager Web Frontend

    -
  • jobmanager.web.port: Port of the JobManager’s web interface that displays +
  • jobmanager.web.port: Port of the JobManager's web interface that displays status of running jobs and execution time breakdowns of finished jobs (DEFAULT: 8081).
  • -
  • jobmanager.web.history: The number of latest jobs that the JobManager’s web +
  • jobmanager.web.history: The number of latest jobs that the JobManager's web front-end in its history (DEFAULT: 5).
-

Webclient

+

Webclient

These parameters configure the web interface that can be used to submit jobs and -review the compiler’s execution plans.

+review the compiler's execution plans.

    -
  • webclient.port: The port of the webclient server (DEFAULT: 8080).
  • -
  • webclient.tempdir: The temp directory for the web server. Used for example -for caching file fragments during file-uploads (DEFAULT: The system’s temp +
  • webclient.port: The port of the webclient server (DEFAULT: 8080).
  • +
  • webclient.tempdir: The temp directory for the web server. Used for example +for caching file fragments during file-uploads (DEFAULT: The system's temp directory).
  • -
  • webclient.uploaddir: The directory into which the web server will store +
  • webclient.uploaddir: The directory into which the web server will store uploaded programs (DEFAULT: ${webclient.tempdir}/webclient-jobs/).
  • -
  • webclient.plandump: The directory into which the web server will dump +
  • webclient.plandump: The directory into which the web server will dump temporary JSON files describing the execution plans (DEFAULT: ${webclient.tempdir}/webclient-plans/).
-

File Systems

+

File Systems

The parameters define the behavior of tasks that create result files.

    -
  • fs.overwrite-files: Specifies whether file output writers should overwrite +
  • fs.overwrite-files: Specifies whether file output writers should overwrite existing files by default. Set to true to overwrite by default, false otherwise. (DEFAULT: false)
  • -
  • fs.output.always-create-directory: File writers running with a parallelism +
  • fs.output.always-create-directory: File writers running with a parallelism larger than one create a directory for the output file path and put the different result files (one per parallel writer task) into that directory. If this option is set to true, writers with a parallelism of 1 will also create a directory @@ -406,38 +338,25 @@ writer will directly create the file dir creating a containing directory. (DEFAULT: false)
-

Compiler/Optimizer

+

Compiler/Optimizer

    -
  • compiler.delimited-informat.max-line-samples: The maximum number of line +
  • compiler.delimited-informat.max-line-samples: The maximum number of line samples taken by the compiler for delimited inputs. The samples are used to estimate the number of records. This value can be overridden for a specific -input with the input format’s parameters (DEFAULT: 10).
  • -
  • compiler.delimited-informat.min-line-samples: The minimum number of line +input with the input format's parameters (DEFAULT: 10).
  • +
  • compiler.delimited-informat.min-line-samples: The minimum number of line samples taken by the compiler for delimited inputs. The samples are used to estimate the number of records. This value can be overridden for a specific -input with the input format’s parameters (DEFAULT: 2).
  • -
  • compiler.delimited-informat.max-sample-len: The maximal length of a line +input with the input format's parameters (DEFAULT: 2).
  • +
  • compiler.delimited-informat.max-sample-len: The maximal length of a line sample that the compiler takes for delimited inputs. If the length of a single sample exceeds this value (possible because of misconfiguration of the parser), the sampling aborts. This value can be overridden for a specific input with the -input format’s parameters (DEFAULT: 2097152 (= 2 MiBytes)).
  • +input format's parameters (DEFAULT: 2097152 (= 2 MiBytes)).
-

YARN

-

Please note that all ports used by Flink in a YARN session are offsetted by the YARN application ID -to avoid duplicate port allocations when running multiple YARN sessions in parallel.

- -

So if yarn.am.rpc.port is configured to 10245 and the session’s application ID is application_1406629969999_0002, then the actual port being used is 10245 + 2 = 10247

- -
    -
  • yarn.am.rpc.port: The port that is being opened by the Application Master (AM) to -let the YARN client connect for an RPC serice. (DEFAULT: Port 10245)
  • -
- - -
+ comments powered by Disqus +
Modified: flink/site/docs/0.6-incubating/css/custom.css URL: http://svn.apache.org/viewvc/flink/site/docs/0.6-incubating/css/custom.css?rev=1658387&r1=1658386&r2=1658387&view=diff ============================================================================== --- flink/site/docs/0.6-incubating/css/custom.css (original) +++ flink/site/docs/0.6-incubating/css/custom.css Mon Feb 9 12:40:44 2015 @@ -1,167 +1,9 @@ .extLink { - display: inline !important; - padding-right: 3px !important; + display:inline !important; + padding-right:3px !important; } .small-font-awesome { - font-size: 10px; + font-size:10px; padding-right: 10px; -} - -#logo-element { - display: inline-block; - float: left; - width: 40px; - margin-top: 5px; - margin-right: 5px; -} - -/*** Web Trust CSS ***/ -#af-upfooter { - margin-top: 50px; -} - -/* Navigation Bar */ -.navbar { - padding-top: 20px; - padding-bottom: 20px; -} - -.navbar { - border:none !important; -} - -.navbar-default { - background: #08131F !important; -} - -.navbar-default .navbar-nav > li > a { - color: #fff !important; - font-weight: normal; -} - -.navbar-brand img { - margin-top: -15px; -} - -@media (min-width: 768px) { - .navbar-nav { - margin-left: 5%; - } -} - -/* Side Bar */ -.af-sidebar ul { - list-style-type: none; -} - -.af-sidebar-item.active a { - color: #555; - text-decoration: underline; -} - -.af-label { - background: #08131F; - border-radius: 5px; - color: #fff; - display: block; - font-size: 0.95em; - margin: 10px 0; - padding: 0.4em 0.8em; -} - -.af-sidebar ul ul { - background: #fff; - list-style-type: disc; -} - -/* Downloads Page */ -.af-download-row { - margin-top: 40px; - margin-bottom: 40px; -} - -.af-download-row h3 { - font-size: 1.8em; - margin: 0 0 15px 0; - padding: 0; -} - -.af-download-row h3 span { - font-size: 0.7em; -} - -.af-blue-color { - color: #3795c6; -} - -.af-download-usage { - min-height: 115px; -} - -.af-download-usage p { - color: #959595; - font-size: 0.9em; -} - -.af-download-button { - background: #3795c6; - color: #fff !important; - font-size: 0.9em; - font-weight: bold; - display: block; - margin: 8px auto; - text-align:center; - width: 200px; -} - -.af-download-button:hover { - background: #006599 !important; -} - -.af-small-download-heading { - text-align: center; -} - -.af-small-download-area { - margin-top: 40px; - margin-bottom: 40px; -} - -/* Community Page */ - -.af-team-member-inner { - padding-top: 20px; -} - -.af-team-member-img { - margin:10px auto; - width: 70%; -} - -.af-team-member-img img { - border-radius: 100%; - margin: auto; -} - -.af-team-member-info { - font-size: 0.9em; - margin: 15px 0; - text-align: center; -} - -.af-mailinglist-item { - margin: 40px auto; - text-align: center; -} - -.af-mailinglist-item-inner { - background: #ececec; - border-radius: 6px; - padding: 10px 2% 40px 2%; -} - -.af-mailinglist-item-inner p { - font-weight: bold; - margin: 20px 0; } \ No newline at end of file