From yarn-issues-return-137282-archive-asf-public=cust-asf.ponee.io@hadoop.apache.org Mon Feb 5 21:11:08 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id 45FA2180647 for ; Mon, 5 Feb 2018 21:11:08 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 354E3160C4B; Mon, 5 Feb 2018 20:11:08 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 7DBDE160C3B for ; Mon, 5 Feb 2018 21:11:07 +0100 (CET) Received: (qmail 63166 invoked by uid 500); 5 Feb 2018 20:11:06 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 63155 invoked by uid 99); 5 Feb 2018 20:11:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Feb 2018 20:11:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 0598B1A0321 for ; Mon, 5 Feb 2018 20:11:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -101.511 X-Spam-Level: X-Spam-Status: No, score=-101.511 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id N-sgrQvRJLOs for ; Mon, 5 Feb 2018 20:11:05 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id B5B475F6C8 for ; Mon, 5 Feb 2018 20:11:04 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 949BFE00EA for ; Mon, 5 Feb 2018 20:11:03 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 066C721E86 for ; Mon, 5 Feb 2018 20:11:02 +0000 (UTC) Date: Mon, 5 Feb 2018 20:11:02 +0000 (UTC) From: "Yufei Gu (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-7655) avoid AM preemption caused by RRs for specific nodes or racks MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16352866#comment-16352866 ] Yufei Gu commented on YARN-7655: -------------------------------- Hi [~Steven Rand], I made it work by using following resource requests: {code} ResourceRequest nodeRequest = createResourceRequest(GB, node1.getHostName(), 1, 4, true); ResourceRequest rackRequest = createResourceRequest(GB, node1.getRackName(), 1, 4, true); ResourceRequest anyRequest = createResourceRequest(GB, ResourceRequest.ANY, 1, 4, true); ... verifyPreemption(4, 4); {code} Does this sound good to you? > avoid AM preemption caused by RRs for specific nodes or racks > ------------------------------------------------------------- > > Key: YARN-7655 > URL: https://issues.apache.org/jira/browse/YARN-7655 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler > Affects Versions: 3.0.0 > Reporter: Steven Rand > Assignee: Steven Rand > Priority: Major > Attachments: YARN-7655-001.patch, YARN-7655-002.patch > > > We frequently see AM preemptions when {{starvedApp.getStarvedResourceRequests()}} in {{FSPreemptionThread#identifyContainersToPreempt}} includes one or more RRs that request containers on a specific node. Since this causes us to only consider one node to preempt containers on, the really good work that was done in YARN-5830 doesn't save us from AM preemption. Even though there might be multiple nodes on which we could preempt enough non-AM containers to satisfy the app's starvation, we often wind up preempting one or more AM containers on the single node that we're considering. > A proposed solution is that if we're going to preempt one or more AM containers for an RR that specifies a node or rack, then we should instead expand the search space to consider all nodes. That way we take advantage of YARN-5830, and only preempt AMs if there's no alternative. I've attached a patch with an initial implementation of this. We've been running it on a few clusters, and have seen AM preemptions drop from double-digit occurrences on many days to zero. > Of course, the tradeoff is some loss of locality, since the starved app is less likely to be allocated resources at the most specific locality level that it asked for. My opinion is that this tradeoff is worth it, but interested to hear what others think as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org