Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 1D4E9200C6A for ; Wed, 19 Apr 2017 10:27:45 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 1BF0B160BAA; Wed, 19 Apr 2017 08:27:45 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 637D5160B86 for ; Wed, 19 Apr 2017 10:27:44 +0200 (CEST) Received: (qmail 24291 invoked by uid 500); 19 Apr 2017 08:27:43 -0000 Mailing-List: contact dev-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list dev@flink.apache.org Received: (qmail 24280 invoked by uid 99); 19 Apr 2017 08:27:43 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Apr 2017 08:27:43 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 1C8BF1A093D for ; Wed, 19 Apr 2017 08:27:43 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id LR8-8qXZgdb3 for ; Wed, 19 Apr 2017 08:27:42 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 32F925FB64 for ; Wed, 19 Apr 2017 08:27:42 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id C73D7E0641 for ; Wed, 19 Apr 2017 08:27:41 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 843D321B45 for ; Wed, 19 Apr 2017 08:27:41 +0000 (UTC) Date: Wed, 19 Apr 2017 08:27:41 +0000 (UTC) From: "zhijiang (JIRA)" To: dev@flink.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (FLINK-6325) Refinement of slot reuse for task manager failure MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 19 Apr 2017 08:27:45 -0000 zhijiang created FLINK-6325: ------------------------------- Summary: Refinement of slot reuse for task manager failure Key: FLINK-6325 URL: https://issues.apache.org/jira/browse/FLINK-6325 Project: Flink Issue Type: Improvement Components: JobManager Reporter: zhijiang Priority: Minor After task or TaskManager failure, the new execution attempt tries to take the slot from prior execution by default. It can get benefits for state recovery locality by RocksDB backend, and it actually makes sense for task failure scenario. But for TaskManager failure scenario, the inside slot is recycled and can not be reused any more. When the inside execution resets to allocate slot from {{SlotPool}}, no slot can be matched by {{ResourceID}}, then it will try to match any other available slots by {{ResourceProfile}}. As a result, the other parallel execution's slot will be occupied by this execution in failed {{TaskManager}}, and all the following executions may not reuse the previous slots any more. It will bring bad effects for state recovery. To solve this problem, we would like to request a new slot for re-deployment when attached with an unavailable location, so it will not occupy the other alive slots any more. -- This message was sent by Atlassian JIRA (v6.3.15#6346)