flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From fhueske <...@git.apache.org>
Subject [GitHub] flink pull request: [FLINK-2363] [docs] First part of internals - ...
Date Wed, 15 Jul 2015 09:45:32 GMT
Github user fhueske commented on a diff in the pull request:

    --- Diff: docs/internals/through_stack.md ---
    @@ -0,0 +1,181 @@
    +title:  "From Program to Result: A Dive through Stack and Execution"
    +Licensed to the Apache Software Foundation (ASF) under one
    +or more contributor license agreements.  See the NOTICE file
    +distributed with this work for additional information
    +regarding copyright ownership.  The ASF licenses this file
    +to you under the Apache License, Version 2.0 (the
    +"License"); you may not use this file except in compliance
    +with the License.  You may obtain a copy of the License at
    +  http://www.apache.org/licenses/LICENSE-2.0
    +Unless required by applicable law or agreed to in writing,
    +software distributed under the License is distributed on an
    +KIND, either express or implied.  See the License for the
    +specific language governing permissions and limitations
    +under the License.
    +This page explains what happens when you execute a Flink program (streaming or streaming).
    +It covers the complete life cycle, from <emph>client-side pre-flight</emph>,
to <emph>JobManager</emph>,
    +and <emph>TaskManager</emph>.
    +Please refer to the [Overview Page](../index.html) for a high-level overview of the processes
and the stack.
    +* This will be replaced by the TOC
    +## Batch API: DataSets to JobGraph
    +## Streaming API: DataStreams to JobGraph
    +## Client: Submitting a Job, Receiving Statistics & Results
    +## JobManager: Receiving and Starting a Job
    +parallel operator instance
    +execution attempts
    +## TaskManager: Running Tasks
    +The *{% gh_link flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala
"TaskManager" %}*
    +is the worker process in Flink. It runs the *{% gh_link flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java
"Tasks" %}*,
    +which execute a parallel operator instance.
    +The TaskManager itself is an Akka actor and has many utility components, such as:
    + - The *{% gh_link flink-runtime/src/main/java/org/apache/flink/runtime/io/network/NetworkEnvironment.java
"NetworkEnvironment" %}*, which takes care of all data exchanges (streamed and batched) between
TaskManagers. Cached data sets in Flink are also cached streams, to the network environment
is responsible for that as well.
    + - The *{% gh_link flink-runtime/src/main/java/org/apache/flink/runtime/memorymanager/MemoryManager.java
"MemoryManager" %}*, which governs the memory for sorting, hashing, and in-operator data caching.
    + - The *{% gh_link flink-runtime/src/main/java/org/apache/flink/runtime/io/disk/iomanager/IOManager.java
"I/O Manager" %}*, which governs the memory for sorting, hashing, and in-operator data caching.
    + - The *{% gh_link flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/LibraryCacheManager.java
"Library Cache" %}*, which gives access to JAR files needed by tasks.
    +When the TaskManager runs tasks, it does not know anything about the task's role in the
dataflow. The TaskManager only knows the tasks itself and streams that the task interacts
    +Any form of cross-task coordination must go through the JobManager.
    +The execution of a Task begins when the TaskManager receives the *SubmitTask* message.
The message contains the
    +*{% gh_link flink-runtime/src/main/java/org/apache/flink/runtime/deployment/TaskDeploymentDescriptor.java
"TaskDeploymentDescriptor" %}*. This descriptor defines everything
    +a task needs to know:
    + - The unique task {% gh_link flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionAttemptID.java
"Execution ID" %}.
    + - The name of the executable code class of the task (batch operator, stream operator,
iterative operator, ...)
    + - The IDs of the streams that the task reads. If not all streams are ready yet, some
are set to "pending".
    + - The IDs of the streams that the task produces.
    + - The configuration of the executable code. This contains the actual operator type (mapper,
join, stream window, ...) and the user code, as a serialized closure (MapFunction, JoinFunction,
    + - The hashes if the libraries (JAR files) that the code needs.
    + - ...
    +The TaskManager creates the *Task object* from the deployment descriptor, registers the
task internally under its ID, and spawns a thread that will execute the task.
    +After that, the TaskManager is done, and the Task runs by itself, reporting its status
to the TaskManager as it progresses or fails.
    +The Task executes in the following stages:
    + 1. It takes the hashes of the required JAR files and makes sure these files are in the
library cache. If they are not yet there, the library cache downloads them from the JobManager's
BLOB server. This operation may take a while, if the JAR files are very large (many libraries).
Note that this only needs to be done for the first Task of a program on each TaskManager.
All successive Tasks should find the JAR files cached.
    + 2. It creates a classloader from these JAR files. The classloader is called the *user-code
classloader* and is used to resolve all classes that can be user-defined.
    + 3. It registers its consumed and produces data streams (input and output) at the network
environment. This reserves resources for the Task by creating the buffer pools that are used
for data exchange.
    + 4. In case the task uses state of a checkpoint (a streaming task that is restarted after
a failure), it restores this state. This may involve fetching the state from remote storage,
depending on where the state was stored.
    + 5. The Task switches to *"RUNNING"* and notifies the TaskManager of that progress. The
TaskManager in turn sends a {% gh_link /flink-runtime/src/main/scala/org/apache/flink/runtime/messages/TaskMessages.scala#L142
"UpdateTaskExecutionState" %} actor message to the JobManager, to notify it of the progress.
    + 6. The Task invokes the executable code that was configured. This code is usually generic
and only differentiates between coarse classes of operations (e.g., {% gh_link flink-runtime/src/main/java/org/apache/flink/runtime/operators/RegularPactTask.java
"Plain Batch Task" %} , {% gh_link flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java
"Streaming Operator" %}, or {% gh_link flink-runtime/src/main/java/org/apache/flink/runtime/iterative/task/IterationIntermediatePactTask.java
"Iterative Batch Task" %}). Internally, these generic operators instantiate the specific operator
code (e.g., map task, join task, window reducer, ...) and the user functions and executes
    + 7. If the Task was deployed before all iof its inputs were available (early deployment),
the Task receives updates on those newly available streams.
    --- End diff --
    "iof" -> "of"

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message