flink-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From u..@apache.org
Subject flink git commit: [docs] Add migration documentation
Date Mon, 06 Feb 2017 16:15:06 GMT
Repository: flink
Updated Branches:
  refs/heads/master c69859698 -> 3ef61efe5

[docs] Add migration documentation

This closes #3258.

Project: http://git-wip-us.apache.org/repos/asf/flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/3ef61efe
Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/3ef61efe
Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/3ef61efe

Branch: refs/heads/master
Commit: 3ef61efe5dfcbef5e22d1490f8c99979b4165b68
Parents: c698596
Author: Stefan Richter <s.richter@data-artisans.com>
Authored: Mon Jan 23 13:26:35 2017 +0100
Committer: Ufuk Celebi <uce@apache.org>
Committed: Mon Feb 6 17:14:36 2017 +0100

 docs/ops/upgrading.md | 96 ++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 89 insertions(+), 7 deletions(-)

diff --git a/docs/ops/upgrading.md b/docs/ops/upgrading.md
index 8c42517..194d8af 100644
--- a/docs/ops/upgrading.md
+++ b/docs/ops/upgrading.md
@@ -109,23 +109,105 @@ When upgrading an application by changing its topology, a few things
need to be
 ## Upgrading the Flink Framework Version
-  - Either "in place" : Savepoint -> stop/cancel -> shutdown cluster -> start new
version -> start job 
-  - Another cluster variant : Savepoint -> resume in other cluster -> "flip switch"
-> shutdown old cluster
+This section describes the general way of upgrading Flink framework version from version
1.1.x to 1.2.x and migrating your
+jobs between the two versions.
+In a nutshell, this procedure consists of 2 fundamental steps:
+1. Take a savepoint in Flink 1.1.x for the jobs you want to migrate.
+2. Resume your jobs under Flink 1.2.x from the previously taken savepoints.
+Besides those two fundamental steps, some additional steps can be required that depend on
the way you want to change the
+Flink version. In this guide we differentiate two approaches to upgrade from Flink 1.1.x
to 1.2.x: **in-place** upgrade and 
+**shadow copy** upgrade.
+For **in-place** update, after taking savepoints, you need to:
+  1. Stop/cancel all running jobs.
+  2. Shutdown the cluster that runs Flink 1.1.x.
+  3. Upgrade Flink to 1.2.x. on the cluster.
+  4. Restart the cluster under the new version.
+For **shadow copy**, you need to:
+  1. Before resuming from the savepoint, setup a new installation of Flink 1.2.x besides
your old Flink 1.1.x installation.
+  2. Resume from the savepoints with the new Flink 1.2.x installation.
+  3. If everything runs ok, stop and shutdown the old Flink 1.1.x cluster.
+In the following, we will first present the preconditions for successful job migration and
then go into more detail 
+about the steps that we outlined before.
+### Preconditions
+Before starting the migration, please check that the jobs you are trying to migrate are following
+best practises for [savepoints]({{ site.baseurl }}/setup/savepoints.html). In particular,
we advise you to check that 
+explicit `uid`s were set for operators in your job. 
+This is a *soft* precondition, and restore *should* still work in case you forgot about assigning
+If you run into a case where this is not working, you can *manually* add the generated legacy
vertex ids from Flink 1.1 
+to your job using the `setUidHash(String hash)` call. For each operator (in operator chains:
only the head operator) you 
+must assign the 32 character hex string representing the hash that you can see in the web
ui or logs for the operator.
+Besides operator uids, there are currently three *hard* preconditions for job migration that
will make migration fail: 
+1. as mentioned in earlier release notes, we do not support migration for state in RocksDB
that was checkpointed using 
+`semi-asynchronous` mode. In case your old job was using this mode, you can still change
your job to use 
+`fully-asynchronous` mode before taking the savepoint that is used as the basis for the migration.
+2. The CEP operator is currently not supported for migration. If your job uses this operator
you can (curently) not 
+migrate it. We are planning to provide migration support for the CEP operator in a later
bugfix release.
+3. Another **important** precondition is that all the savepoint data must be accessible from
the new installation and 
+reside under the same absolute path. Please notice that the savepoint data is typically not
self contained in just the created 
+savepoint file. Additional files can be referenced from inside the savepoint file (e.g. the
output from state backend 
+snapshots)! There is currently no simple way to identify and move all data that belongs to
a savepoint.
+### STEP 1: Taking a savepoint in Flink 1.1.x.
+First major step in job migration is taking a savepoint of your job running in Flink 1.1.x.
You can do this with the
+$ bin/flink savepoint :jobId [:targetDirectory]
+For more details, please read the [savepoint documentation]({{ site.baseurl }}/setup/savepoints.html).
+### STEP 2: Updating your cluster to Flink 1.2.x.
+In this step, we update the framework version of the cluster. What this basically means is
replacing the content of
+the Flink installation with the new version. This step can depend on how you are running
Flink in your cluster (e.g. 
+standalone, on Mesos, ...).
+If you are unfamiliar with installing Flink in your cluster, please read the [deployment
and cluster setup documentation]({{ site.baseurl }}/setup/index.html).
+### STEP 3: Resuming the job under Flink 1.2.x from Flink 1.1.x savepoint.
+As the last step of job migration, you resume from the savepoint taken above on the updated
cluster. You can do
+this with the command:
+$ bin/flink run -s :savepointPath [:runArgs]
+Again, for more details, please take a look at the [savepoint documentation]({{ site.baseurl
 ## Compatibility Table
 Savepoints are compatible across Flink versions as indicated by the table below:
-| Created with \ Resumed With | 1.1.x | 1.2.x |
+| Created with \ Resumed with | 1.1.x | 1.2.x |
 | ---------------------------:|:-----:|:-----:|
 | 1.1.x                       |   X   |   X   |
 | 1.2.x                       |       |   X   |
-## Special Considerations for Upgrades from Flink 1.1.x to Flink 1.2.x
-  - The parallelism of the Savepoint in Flink 1.1.x becomes the maximum parallelism in Flink
-  - Increasing the parallelism for upgraded jobs is not possible out of the box.
+## Limitations and Special Considerations for Upgrades from Flink 1.1.x to Flink 1.2.x
+  - The maximum parallelism of a job that was migrated from Flink 1.1.x to 1.2.x is currently
fixed as the parallelism of 
+  the job. This means that the parallelism can not be increased after migration. This limitation
might be removed in a 
+  future bugfix release.

View raw message