Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9DFA8200C15 for ; Wed, 8 Feb 2017 20:39:47 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 9AE08160B6A; Wed, 8 Feb 2017 19:39:47 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E2F77160B49 for ; Wed, 8 Feb 2017 20:39:46 +0100 (CET) Received: (qmail 70967 invoked by uid 500); 8 Feb 2017 19:39:45 -0000 Mailing-List: contact dev-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list dev@flink.apache.org Received: (qmail 70956 invoked by uid 99); 8 Feb 2017 19:39:45 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Feb 2017 19:39:45 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 923E0C05D7 for ; Wed, 8 Feb 2017 19:39:45 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.999 X-Spam-Level: X-Spam-Status: No, score=-1.999 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id QAHxoZXsqkNN for ; Wed, 8 Feb 2017 19:39:44 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 468F25FBEE for ; Wed, 8 Feb 2017 19:39:44 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 1BC7EE05B8 for ; Wed, 8 Feb 2017 19:39:43 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 24CE825297 for ; Wed, 8 Feb 2017 19:39:42 +0000 (UTC) Date: Wed, 8 Feb 2017 19:39:42 +0000 (UTC) From: "Stephan Ewen (JIRA)" To: dev@flink.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (FLINK-5747) Eager Scheduling should deploy all Tasks together MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 08 Feb 2017 19:39:47 -0000 Stephan Ewen created FLINK-5747: ----------------------------------- Summary: Eager Scheduling should deploy all Tasks together Key: FLINK-5747 URL: https://issues.apache.org/jira/browse/FLINK-5747 Project: Flink Issue Type: Bug Components: JobManager Affects Versions: 1.2.0 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.3.0 Currently, eager scheduling immediately triggers the scheduling for all vertices and their subtasks in topological order. This has two problems: - This works only, as long as resource acquisition is "synchronous". With dynamic resource acquisition in FLIP-6, the resources are returned as Futures which may complete out of order. This results in out-of-order (not in topological order) scheduling of tasks which does not work for streaming. - Deploying some tasks that depend on other tasks before it is clear that the other tasks have resources as well leads to situations where many deploy/recovery cycles happen before enough resources are available to get the job running fully. For eager scheduling, we should allocate all resources in one chunk and then deploy once we know that all are available. As a follow-up, the same should be done per pipelined component in lazy batch scheduling as well. That way we get lazy scheduling across blocking boundaries, and bulk (gang) scheduling in pipelined subgroups. This also does not apply for efforts of fine grained recovery, where individual tasks request replacement resources. -- This message was sent by Atlassian JIRA (v6.3.15#6346)