Mailing-List: contact issues-help@flink.incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@flink.incubator.apache.org
Date: Mon, 1 Dec 2014 18:09:14 +0000 (UTC)
From: "ASF GitHub Bot (JIRA)" <jira@apache.org>
To: issues@flink.incubator.apache.org
Message-ID: <JIRA.12747710.1413202977000.50490.1417457354006@Atlassian.JIRA>
In-Reply-To: <JIRA.12747710.1413202977000@Atlassian.JIRA>
References: <JIRA.12747710.1413202977000@Atlassian.JIRA>
 <JIRA.12747710.1413202977264@arcas>
Subject: [jira] [Commented] (FLINK-1157) Document TaskManager Slots
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/FLINK-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230155#comment-14230155 ] 

ASF GitHub Bot commented on FLINK-1157:
---------------------------------------

Github user uce commented on a diff in the pull request:

    https://github.com/apache/incubator-flink/pull/246#discussion_r21106379
  
    --- Diff: docs/config.md ---
    @@ -266,3 +272,79 @@ So if `yarn.am.rpc.port` is configured to `10245` and the session's application
     
     - `yarn.am.rpc.port`: The port that is being opened by the Application Master (AM) to 
     let the YARN client connect for an RPC serice. (DEFAULT: Port 10245)
    +
    +
    +## Background
    +
    +### Configuring the Network Buffers
    +
    +Network buffers are a critical resource for the communication layers. They are
    +used to buffer records before transmission over a network, and to buffer
    +incoming data before dissecting it into records and handing them to the
    +application. A sufficient number of network buffers are critical to achieve a
    +good throughput.
    +
    +In general, configure the task manager to have so many buffers that each logical
    +network connection on you expect to be open at the same time has a dedicated
    +buffer. A logical network connection exists for each point-to-point exchange of
    +data over the network, which typically happens at repartitioning- or
    +broadcasting steps. In those, each parallel task inside the TaskManager has to
    +be able to talk to all other parallel tasks. Hence, the required number of
    +buffers on a task manager is *total-degree-of-parallelism* (number of targets)
    +\* *intra-node-parallelism* (number of sources in one task manager) \* *n*.
    +Here, *n* is a constant that defines how many repartitioning-/broadcasting steps
    +you expect to be active at the same time.
    +
    +Since the *intra-node-parallelism* is typically the number of cores, and more
    +than 4 repartitioning or broadcasting channels are rarely active in parallel, it
    +frequently boils down to *\#cores\^2\^* \* *\#machines* \* 4. To support for
    +example a cluster of 20 8-core machines, you should use roughly 5000 network
    +buffers for optimal throughput.
    +
    +Each network buffer is by default 64 KiBytes large. In the above example, the
    +system would allocate roughly 300 MiBytes for network buffers.
    +
    +The number and size of network buffers can be configured with the following
    +parameters:
    +
    +- `taskmanager.network.numberOfBuffers`, and
    +- `taskmanager.network.bufferSizeInBytes`.
    +
    +### Configuring Temporary I/O Directories
    +
    +Although Flink aims to process as much data in main memory as possible,
    +it is not uncommon that  more data needs to be processed than memory is
    +available. Flink's runtime is designed to  write temporary data to disk
    +to handle these situations.
    +
    +The `taskmanager.tmp.dirs` parameter specifies a list of directories into which
    +Flink writes temporary files. The paths of the directories need to be
    +separated by ':' (colon character).  Flink will concurrently write (or
    +read) one temporary file to (from) each configured directory.  This way,
    +temporary I/O can be evenly distributed over multiple independent I/O devices
    +such as hard disks to improve performance.  To leverage fast I/O devices (e.g.,
    +SSD, RAID, NAS), it is possible to specify a directory multiple times.
    +
    +If the `taskmanager.tmp.dirs` parameter is not explicitly specified,
    +Flink writes temporary data to the temporary  directory of the operating
    +system, such as */tmp* in Linux systems.
    +
    +
    +### Configuring TaskManager processing slots
    +
    +A processing slot allows Flink to execute a distributed DataSet transformation, such as a
    +data source or a map-transformation.
    +
    +Each Flink TaskManager provides processing slots in the cluster. The number of slots
    +is typically proportional to the number of available CPU cores __of each__ TaskManager.
    +As a general recommendation, the number of available CPU cores is a good default for 
    +`taskmanager.numberOfTaskSlots`.
    +
    +When starting a Flink application, users can supply the default number of slots to use for that job.
    +The command line value therefore is called `-p` (for parallelism). In addition, it is possible
    +to [set the number of slots in the programming APIs](programming_guide.html#parallel-execution) for 
    +the whole application and individual operators.
    +
    +Flink is currently scheduling an application to slots by "filling" them up. 
    +If the cluster has 20 machines with 2 slots each (40 slots in total) but the application is running
    +with a parallelism of 20, only 10 machines are processing data.
    --- End diff --
    
    are processing => will process


> Document TaskManager Slots
> --------------------------
>
>                 Key: FLINK-1157
>                 URL: https://issues.apache.org/jira/browse/FLINK-1157
>             Project: Flink
>          Issue Type: Improvement
>          Components: Documentation
>            Reporter: Robert Metzger
>            Assignee: Robert Metzger
>
> Slots are not explained in the documentation.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)