Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 0E138200CC1 for ; Mon, 10 Jul 2017 19:53:37 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 0C846163E6D; Mon, 10 Jul 2017 17:53:37 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 53315163E69 for ; Mon, 10 Jul 2017 19:53:36 +0200 (CEST) Received: (qmail 6802 invoked by uid 500); 10 Jul 2017 17:53:35 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 6789 invoked by uid 99); 10 Jul 2017 17:53:35 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Jul 2017 17:53:35 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id E92D8193747 for ; Mon, 10 Jul 2017 17:53:34 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id JILOddp0oKWv for ; Mon, 10 Jul 2017 17:53:34 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id CDCED62814 for ; Mon, 10 Jul 2017 17:47:00 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 6B43CE0ADD for ; Mon, 10 Jul 2017 17:47:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 2A9FD24695 for ; Mon, 10 Jul 2017 17:47:00 +0000 (UTC) Date: Mon, 10 Jul 2017 17:47:00 +0000 (UTC) From: "Yufei Gu (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References:

Subject: [jira] [Updated] (YARN-6793) Duplicated reservation in Fair Scheduler preemption MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 10 Jul 2017 17:53:37 -0000 [ https://issues.apache.org/jira/browse/YARN-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-6793: --------------------------- Description: There is a delay between preemption happen and containers are killed. If resources released from nodes before container killing are not enough for the resource request preemption asking for, reservation happens again at that node. E.g. scheduler reserves in node 1 for app 1. It will take 15s by default to kill containers in node 1 for fulfill that resource requests. If was released from node 1 before the killing, scheduler reserves again in node 1 for app1. The second reservation may never be unreserved. was: There is a delay between preemption happen and containers are killed. If some resources released from nodes which are supposed to be preempted at that time are not enough for the resource request, reservation happens again at that node. E.g. scheduler reserves in node 1 for app 1. It will take 15s by default to kill containers in node 1 for fulfill that resource requests. If was released from node 1 before the killing, scheduler reserves again in node 1 for app1. The second reservation may never be unreserved. > Duplicated reservation in Fair Scheduler preemption > ---------------------------------------------------- > > Key: YARN-6793 > URL: https://issues.apache.org/jira/browse/YARN-6793 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler > Affects Versions: 2.8.1, 3.0.0-alpha3 > Reporter: Yufei Gu > Assignee: Yufei Gu > Priority: Critical > > There is a delay between preemption happen and containers are killed. If resources released from nodes before container killing are not enough for the resource request preemption asking for, reservation happens again at that node. > E.g. scheduler reserves in node 1 for app 1. It will take 15s by default to kill containers in node 1 for fulfill that resource requests. If was released from node 1 before the killing, scheduler reserves again in node 1 for app1. The second reservation may never be unreserved. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org