From issues-return-198714-archive-asf-public=cust-asf.ponee.io@flink.apache.org Thu Nov 1 11:37:08 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 947B1180652 for ; Thu, 1 Nov 2018 11:37:07 +0100 (CET) Received: (qmail 74359 invoked by uid 500); 1 Nov 2018 10:37:06 -0000 Mailing-List: contact issues-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list issues@flink.apache.org Received: (qmail 74349 invoked by uid 99); 1 Nov 2018 10:37:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Nov 2018 10:37:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 46DF4184F7D for ; Thu, 1 Nov 2018 10:37:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -110.301 X-Spam-Level: X-Spam-Status: No, score=-110.301 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id ertx8V1bQ18R for ; Thu, 1 Nov 2018 10:37:03 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 9752C5F65D for ; Thu, 1 Nov 2018 10:37:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 4DC56E261D for ; Thu, 1 Nov 2018 10:37:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id C142F2776A for ; Thu, 1 Nov 2018 10:37:00 +0000 (UTC) Date: Thu, 1 Nov 2018 10:37:00 +0000 (UTC) From: "Till Rohrmann (JIRA)" To: issues@flink.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Closed] (FLINK-9635) Local recovery scheduling can cause spread out of tasks MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/FLINK-9635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-9635. -------------------------------- Resolution: Fixed Fix Version/s: 1.6.3 Fixed in 1.6.3 via https://github.com/apache/flink/commit/04df02b4728d40b59417ccc8ee281ab3298b09da > Local recovery scheduling can cause spread out of tasks > ------------------------------------------------------- > > Key: FLINK-9635 > URL: https://issues.apache.org/jira/browse/FLINK-9635 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination > Affects Versions: 1.5.0, 1.6.2 > Reporter: Till Rohrmann > Assignee: Stefan Richter > Priority: Critical > Labels: pull-request-available > Fix For: 1.6.3, 1.7.0 > > > In order to make local recovery work, Flink's scheduling was changed such that it tries to be rescheduled to its previous location. In order to not occupy slots which have state of other tasks cached, the strategy will request a new slot if the old slot identified by the previous allocation id is no longer present. This also applies to newly allocated slots because there is no distinction between new or already used. This behaviour can cause that every tasks gets deployed to its own slot if the {{SlotPool}} has released all slots in the meantime, for example. The consequence could be that a job can no longer be executed after a failure because it needs more slots than before. -- This message was sent by Atlassian JIRA (v7.6.3#76005)