Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4BDB1200C13 for ; Mon, 6 Feb 2017 17:15:09 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 4A06A160B53; Mon, 6 Feb 2017 16:15:09 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6F004160B49 for ; Mon, 6 Feb 2017 17:15:08 +0100 (CET) Received: (qmail 35794 invoked by uid 500); 6 Feb 2017 16:15:06 -0000 Mailing-List: contact commits-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list commits@flink.apache.org Received: (qmail 35785 invoked by uid 99); 6 Feb 2017 16:15:06 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Feb 2017 16:15:06 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 3EA49DFBE6; Mon, 6 Feb 2017 16:15:06 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: uce@apache.org To: commits@flink.apache.org Message-Id: X-Mailer: ASF-Git Admin Mailer Subject: flink git commit: [docs] Add migration documentation Date: Mon, 6 Feb 2017 16:15:06 +0000 (UTC) archived-at: Mon, 06 Feb 2017 16:15:09 -0000 Repository: flink Updated Branches: refs/heads/master c69859698 -> 3ef61efe5 [docs] Add migration documentation This closes #3258. Project: http://git-wip-us.apache.org/repos/asf/flink/repo Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/3ef61efe Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/3ef61efe Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/3ef61efe Branch: refs/heads/master Commit: 3ef61efe5dfcbef5e22d1490f8c99979b4165b68 Parents: c698596 Author: Stefan Richter Authored: Mon Jan 23 13:26:35 2017 +0100 Committer: Ufuk Celebi Committed: Mon Feb 6 17:14:36 2017 +0100 ---------------------------------------------------------------------- docs/ops/upgrading.md | 96 ++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 89 insertions(+), 7 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/flink/blob/3ef61efe/docs/ops/upgrading.md ---------------------------------------------------------------------- diff --git a/docs/ops/upgrading.md b/docs/ops/upgrading.md index 8c42517..194d8af 100644 --- a/docs/ops/upgrading.md +++ b/docs/ops/upgrading.md @@ -109,23 +109,105 @@ When upgrading an application by changing its topology, a few things need to be ## Upgrading the Flink Framework Version - - Either "in place" : Savepoint -> stop/cancel -> shutdown cluster -> start new version -> start job - - Another cluster variant : Savepoint -> resume in other cluster -> "flip switch" -> shutdown old cluster +This section describes the general way of upgrading Flink framework version from version 1.1.x to 1.2.x and migrating your +jobs between the two versions. + +In a nutshell, this procedure consists of 2 fundamental steps: + +1. Take a savepoint in Flink 1.1.x for the jobs you want to migrate. +2. Resume your jobs under Flink 1.2.x from the previously taken savepoints. + +Besides those two fundamental steps, some additional steps can be required that depend on the way you want to change the +Flink version. In this guide we differentiate two approaches to upgrade from Flink 1.1.x to 1.2.x: **in-place** upgrade and +**shadow copy** upgrade. + +For **in-place** update, after taking savepoints, you need to: + + 1. Stop/cancel all running jobs. + 2. Shutdown the cluster that runs Flink 1.1.x. + 3. Upgrade Flink to 1.2.x. on the cluster. + 4. Restart the cluster under the new version. + +For **shadow copy**, you need to: + + 1. Before resuming from the savepoint, setup a new installation of Flink 1.2.x besides your old Flink 1.1.x installation. + 2. Resume from the savepoints with the new Flink 1.2.x installation. + 3. If everything runs ok, stop and shutdown the old Flink 1.1.x cluster. + +In the following, we will first present the preconditions for successful job migration and then go into more detail +about the steps that we outlined before. + +### Preconditions + +Before starting the migration, please check that the jobs you are trying to migrate are following the +best practises for [savepoints]({{ site.baseurl }}/setup/savepoints.html). In particular, we advise you to check that +explicit `uid`s were set for operators in your job. + +This is a *soft* precondition, and restore *should* still work in case you forgot about assigning `uid`s. +If you run into a case where this is not working, you can *manually* add the generated legacy vertex ids from Flink 1.1 +to your job using the `setUidHash(String hash)` call. For each operator (in operator chains: only the head operator) you +must assign the 32 character hex string representing the hash that you can see in the web ui or logs for the operator. + +Besides operator uids, there are currently three *hard* preconditions for job migration that will make migration fail: + +1. as mentioned in earlier release notes, we do not support migration for state in RocksDB that was checkpointed using +`semi-asynchronous` mode. In case your old job was using this mode, you can still change your job to use +`fully-asynchronous` mode before taking the savepoint that is used as the basis for the migration. + +2. The CEP operator is currently not supported for migration. If your job uses this operator you can (curently) not +migrate it. We are planning to provide migration support for the CEP operator in a later bugfix release. + +3. Another **important** precondition is that all the savepoint data must be accessible from the new installation and +reside under the same absolute path. Please notice that the savepoint data is typically not self contained in just the created +savepoint file. Additional files can be referenced from inside the savepoint file (e.g. the output from state backend +snapshots)! There is currently no simple way to identify and move all data that belongs to a savepoint. + + +### STEP 1: Taking a savepoint in Flink 1.1.x. + +First major step in job migration is taking a savepoint of your job running in Flink 1.1.x. You can do this with the +command: + +```sh +$ bin/flink savepoint :jobId [:targetDirectory] +``` + +For more details, please read the [savepoint documentation]({{ site.baseurl }}/setup/savepoints.html). + +### STEP 2: Updating your cluster to Flink 1.2.x. + +In this step, we update the framework version of the cluster. What this basically means is replacing the content of +the Flink installation with the new version. This step can depend on how you are running Flink in your cluster (e.g. +standalone, on Mesos, ...). + +If you are unfamiliar with installing Flink in your cluster, please read the [deployment and cluster setup documentation]({{ site.baseurl }}/setup/index.html). + +### STEP 3: Resuming the job under Flink 1.2.x from Flink 1.1.x savepoint. + +As the last step of job migration, you resume from the savepoint taken above on the updated cluster. You can do +this with the command: + +```sh +$ bin/flink run -s :savepointPath [:runArgs] +``` + +Again, for more details, please take a look at the [savepoint documentation]({{ site.baseurl }}/setup/savepoints.html). ## Compatibility Table Savepoints are compatible across Flink versions as indicated by the table below: -| Created with \ Resumed With | 1.1.x | 1.2.x | +| Created with \ Resumed with | 1.1.x | 1.2.x | | ---------------------------:|:-----:|:-----:| | 1.1.x | X | X | | 1.2.x | | X | -## Special Considerations for Upgrades from Flink 1.1.x to Flink 1.2.x - - - The parallelism of the Savepoint in Flink 1.1.x becomes the maximum parallelism in Flink 1.2.x. - - Increasing the parallelism for upgraded jobs is not possible out of the box. +## Limitations and Special Considerations for Upgrades from Flink 1.1.x to Flink 1.2.x + + - The maximum parallelism of a job that was migrated from Flink 1.1.x to 1.2.x is currently fixed as the parallelism of + the job. This means that the parallelism can not be increased after migration. This limitation might be removed in a + future bugfix release.