Return-Path: X-Original-To: apmail-flink-issues-archive@minotaur.apache.org Delivered-To: apmail-flink-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id ED3FA18EEB for ; Wed, 15 Jul 2015 09:46:04 +0000 (UTC) Received: (qmail 62604 invoked by uid 500); 15 Jul 2015 09:46:04 -0000 Delivered-To: apmail-flink-issues-archive@flink.apache.org Received: (qmail 62560 invoked by uid 500); 15 Jul 2015 09:46:04 -0000 Mailing-List: contact issues-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list issues@flink.apache.org Received: (qmail 62497 invoked by uid 99); 15 Jul 2015 09:46:04 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jul 2015 09:46:04 +0000 Date: Wed, 15 Jul 2015 09:46:04 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@flink.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (FLINK-2363) Add an end-to-end overview of program execution in Flink to the docs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/FLINK-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627805#comment-14627805 ] ASF GitHub Bot commented on FLINK-2363: --------------------------------------- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/913#discussion_r34662306 --- Diff: docs/internals/through_stack.md --- @@ -0,0 +1,181 @@ +--- +title: "From Program to Result: A Dive through Stack and Execution" +--- + + +This page explains what happens when you execute a Flink program (streaming or streaming). +It covers the complete life cycle, from client-side pre-flight, to JobManager, +and TaskManager. + +Please refer to the [Overview Page](../index.html) for a high-level overview of the processes and the stack. + +* This will be replaced by the TOC +{:toc} + + +## Batch API: DataSets to JobGraph + +## Streaming API: DataStreams to JobGraph + +## Client: Submitting a Job, Receiving Statistics & Results + + + +## JobManager: Receiving and Starting a Job + +parallel operator instance + +execution attempts + + +## TaskManager: Running Tasks + +The *{% gh_link flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala "TaskManager" %}* +is the worker process in Flink. It runs the *{% gh_link flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java "Tasks" %}*, +which execute a parallel operator instance. + +The TaskManager itself is an Akka actor and has many utility components, such as: + + - The *{% gh_link flink-runtime/src/main/java/org/apache/flink/runtime/io/network/NetworkEnvironment.java "NetworkEnvironment" %}*, which takes care of all data exchanges (streamed and batched) between TaskManagers. Cached data sets in Flink are also cached streams, to the network environment is responsible for that as well. + + - The *{% gh_link flink-runtime/src/main/java/org/apache/flink/runtime/memorymanager/MemoryManager.java "MemoryManager" %}*, which governs the memory for sorting, hashing, and in-operator data caching. + + - The *{% gh_link flink-runtime/src/main/java/org/apache/flink/runtime/io/disk/iomanager/IOManager.java "I/O Manager" %}*, which governs the memory for sorting, hashing, and in-operator data caching. + + - The *{% gh_link flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/LibraryCacheManager.java "Library Cache" %}*, which gives access to JAR files needed by tasks. + +When the TaskManager runs tasks, it does not know anything about the task's role in the dataflow. The TaskManager only knows the tasks itself and streams that the task interacts with. +Any form of cross-task coordination must go through the JobManager. + +The execution of a Task begins when the TaskManager receives the *SubmitTask* message. The message contains the +*{% gh_link flink-runtime/src/main/java/org/apache/flink/runtime/deployment/TaskDeploymentDescriptor.java "TaskDeploymentDescriptor" %}*. This descriptor defines everything +a task needs to know: + + - The unique task {% gh_link flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionAttemptID.java "Execution ID" %}. + - The name of the executable code class of the task (batch operator, stream operator, iterative operator, ...) + - The IDs of the streams that the task reads. If not all streams are ready yet, some are set to "pending". + - The IDs of the streams that the task produces. + - The configuration of the executable code. This contains the actual operator type (mapper, join, stream window, ...) and the user code, as a serialized closure (MapFunction, JoinFunction, ...) + - The hashes if the libraries (JAR files) that the code needs. + - ... + +The TaskManager creates the *Task object* from the deployment descriptor, registers the task internally under its ID, and spawns a thread that will execute the task. +After that, the TaskManager is done, and the Task runs by itself, reporting its status to the TaskManager as it progresses or fails. + +The Task executes in the following stages: + + 1. It takes the hashes of the required JAR files and makes sure these files are in the library cache. If they are not yet there, the library cache downloads them from the JobManager's BLOB server. This operation may take a while, if the JAR files are very large (many libraries). Note that this only needs to be done for the first Task of a program on each TaskManager. All successive Tasks should find the JAR files cached. + + 2. It creates a classloader from these JAR files. The classloader is called the *user-code classloader* and is used to resolve all classes that can be user-defined. + + 3. It registers its consumed and produces data streams (input and output) at the network environment. This reserves resources for the Task by creating the buffer pools that are used for data exchange. + + 4. In case the task uses state of a checkpoint (a streaming task that is restarted after a failure), it restores this state. This may involve fetching the state from remote storage, depending on where the state was stored. + + 5. The Task switches to *"RUNNING"* and notifies the TaskManager of that progress. The TaskManager in turn sends a {% gh_link /flink-runtime/src/main/scala/org/apache/flink/runtime/messages/TaskMessages.scala#L142 "UpdateTaskExecutionState" %} actor message to the JobManager, to notify it of the progress. + + 6. The Task invokes the executable code that was configured. This code is usually generic and only differentiates between coarse classes of operations (e.g., {% gh_link flink-runtime/src/main/java/org/apache/flink/runtime/operators/RegularPactTask.java "Plain Batch Task" %} , {% gh_link flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java "Streaming Operator" %}, or {% gh_link flink-runtime/src/main/java/org/apache/flink/runtime/iterative/task/IterationIntermediatePactTask.java "Iterative Batch Task" %}). Internally, these generic operators instantiate the specific operator code (e.g., map task, join task, window reducer, ...) and the user functions and executes them. + + 7. If the Task was deployed before all iof its inputs were available (early deployment), the Task receives updates on those newly available streams. --- End diff -- "iof" -> "of" > Add an end-to-end overview of program execution in Flink to the docs > -------------------------------------------------------------------- > > Key: FLINK-2363 > URL: https://issues.apache.org/jira/browse/FLINK-2363 > Project: Flink > Issue Type: Improvement > Components: Documentation > Reporter: Stephan Ewen > -- This message was sent by Atlassian JIRA (v6.3.4#6332)