Return-Path: X-Original-To: apmail-flink-issues-archive@minotaur.apache.org Delivered-To: apmail-flink-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8785710927 for ; Mon, 1 Dec 2014 18:09:37 +0000 (UTC) Received: (qmail 43609 invoked by uid 500); 1 Dec 2014 18:09:37 -0000 Delivered-To: apmail-flink-issues-archive@flink.apache.org Received: (qmail 43568 invoked by uid 500); 1 Dec 2014 18:09:37 -0000 Mailing-List: contact issues-help@flink.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.incubator.apache.org Delivered-To: mailing list issues@flink.incubator.apache.org Received: (qmail 43559 invoked by uid 99); 1 Dec 2014 18:09:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Dec 2014 18:09:37 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 01 Dec 2014 18:09:35 +0000 Received: (qmail 43104 invoked by uid 99); 1 Dec 2014 18:09:14 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Dec 2014 18:09:14 +0000 Date: Mon, 1 Dec 2014 18:09:14 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@flink.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (FLINK-1157) Document TaskManager Slots MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/FLINK-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230155#comment-14230155 ] ASF GitHub Bot commented on FLINK-1157: --------------------------------------- Github user uce commented on a diff in the pull request: https://github.com/apache/incubator-flink/pull/246#discussion_r21106379 --- Diff: docs/config.md --- @@ -266,3 +272,79 @@ So if `yarn.am.rpc.port` is configured to `10245` and the session's application - `yarn.am.rpc.port`: The port that is being opened by the Application Master (AM) to let the YARN client connect for an RPC serice. (DEFAULT: Port 10245) + + +## Background + +### Configuring the Network Buffers + +Network buffers are a critical resource for the communication layers. They are +used to buffer records before transmission over a network, and to buffer +incoming data before dissecting it into records and handing them to the +application. A sufficient number of network buffers are critical to achieve a +good throughput. + +In general, configure the task manager to have so many buffers that each logical +network connection on you expect to be open at the same time has a dedicated +buffer. A logical network connection exists for each point-to-point exchange of +data over the network, which typically happens at repartitioning- or +broadcasting steps. In those, each parallel task inside the TaskManager has to +be able to talk to all other parallel tasks. Hence, the required number of +buffers on a task manager is *total-degree-of-parallelism* (number of targets) +\* *intra-node-parallelism* (number of sources in one task manager) \* *n*. +Here, *n* is a constant that defines how many repartitioning-/broadcasting steps +you expect to be active at the same time. + +Since the *intra-node-parallelism* is typically the number of cores, and more +than 4 repartitioning or broadcasting channels are rarely active in parallel, it +frequently boils down to *\#cores\^2\^* \* *\#machines* \* 4. To support for +example a cluster of 20 8-core machines, you should use roughly 5000 network +buffers for optimal throughput. + +Each network buffer is by default 64 KiBytes large. In the above example, the +system would allocate roughly 300 MiBytes for network buffers. + +The number and size of network buffers can be configured with the following +parameters: + +- `taskmanager.network.numberOfBuffers`, and +- `taskmanager.network.bufferSizeInBytes`. + +### Configuring Temporary I/O Directories + +Although Flink aims to process as much data in main memory as possible, +it is not uncommon that more data needs to be processed than memory is +available. Flink's runtime is designed to write temporary data to disk +to handle these situations. + +The `taskmanager.tmp.dirs` parameter specifies a list of directories into which +Flink writes temporary files. The paths of the directories need to be +separated by ':' (colon character). Flink will concurrently write (or +read) one temporary file to (from) each configured directory. This way, +temporary I/O can be evenly distributed over multiple independent I/O devices +such as hard disks to improve performance. To leverage fast I/O devices (e.g., +SSD, RAID, NAS), it is possible to specify a directory multiple times. + +If the `taskmanager.tmp.dirs` parameter is not explicitly specified, +Flink writes temporary data to the temporary directory of the operating +system, such as */tmp* in Linux systems. + + +### Configuring TaskManager processing slots + +A processing slot allows Flink to execute a distributed DataSet transformation, such as a +data source or a map-transformation. + +Each Flink TaskManager provides processing slots in the cluster. The number of slots +is typically proportional to the number of available CPU cores __of each__ TaskManager. +As a general recommendation, the number of available CPU cores is a good default for +`taskmanager.numberOfTaskSlots`. + +When starting a Flink application, users can supply the default number of slots to use for that job. +The command line value therefore is called `-p` (for parallelism). In addition, it is possible +to [set the number of slots in the programming APIs](programming_guide.html#parallel-execution) for +the whole application and individual operators. + +Flink is currently scheduling an application to slots by "filling" them up. +If the cluster has 20 machines with 2 slots each (40 slots in total) but the application is running +with a parallelism of 20, only 10 machines are processing data. --- End diff -- are processing => will process > Document TaskManager Slots > -------------------------- > > Key: FLINK-1157 > URL: https://issues.apache.org/jira/browse/FLINK-1157 > Project: Flink > Issue Type: Improvement > Components: Documentation > Reporter: Robert Metzger > Assignee: Robert Metzger > > Slots are not explained in the documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)