Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 10D1A18BBB for ; Tue, 8 Mar 2016 15:03:41 +0000 (UTC) Received: (qmail 37356 invoked by uid 500); 8 Mar 2016 15:03:40 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 37260 invoked by uid 500); 8 Mar 2016 15:03:40 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 37250 invoked by uid 99); 8 Mar 2016 15:03:40 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Mar 2016 15:03:40 +0000 Received: from macbook-pro-2.fritz.box (ip5b40315a.dynamic.kabel-deutschland.de [91.64.49.90]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 22AEE1A00D5 for ; Tue, 8 Mar 2016 15:03:39 +0000 (UTC) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.2 \(3112\)) Subject: Re: rebalance of streaming job after taskManager restart From: Aljoscha Krettek In-Reply-To: <56DEE640.3010001@touk.pl> Date: Tue, 8 Mar 2016 16:03:36 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <0288CD13-415C-44C6-B319-C543AC7DC781@apache.org> References: <56DEE640.3010001@touk.pl> To: user@flink.apache.org X-Mailer: Apple Mail (2.3112) Hi, I think what you can do is make a savepoint of your program, then cancel = it and restart it from the savepoint. This should make Flink = redistribute it on all TaskManagers. See = https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/save= points.html and = https://ci.apache.org/projects/flink/flink-docs-master/apis/cli.html#savep= oints for documentation about savepoints. The steps to follow should be: =20 bin/flink savepoint this will print a savepoint path that you will need later. =20 bin/flink cancel bin/flink run -s =E2=80=A6 The last command is your usual run command but with the additional = =E2=80=9C-s=E2=80=9D parameter to continue from a savepoint. I hope that helps. Cheers, Aljoscha > On 08 Mar 2016, at 15:48, Maciek Pr=C3=B3chniak wrote: >=20 > Hi, >=20 > we have streaming job with paralelism 2 and two task managers. The job = is occupying one slot on each task manager. When I stop manager2 the job = is restarted and it runs on manager1 - occupying two of it's slots. > How can I trigger restart (or other similar process) that will cause = the job to be balanced among task managers? >=20 > thanks, > maciek