From yarn-issues-return-107384-apmail-hadoop-yarn-issues-archive=hadoop.apache.org@hadoop.apache.org Sat Feb 11 00:03:48 2017 Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BEDE319323 for ; Sat, 11 Feb 2017 00:03:48 +0000 (UTC) Received: (qmail 55125 invoked by uid 500); 11 Feb 2017 00:03:48 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 55094 invoked by uid 500); 11 Feb 2017 00:03:48 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 55083 invoked by uid 99); 11 Feb 2017 00:03:48 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 11 Feb 2017 00:03:48 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 200EAC0974 for ; Sat, 11 Feb 2017 00:03:48 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.199 X-Spam-Level: X-Spam-Status: No, score=-1.199 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id vfYLMearlUSR for ; Sat, 11 Feb 2017 00:03:47 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id A4DB45F576 for ; Sat, 11 Feb 2017 00:03:46 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id C543FE0141 for ; Sat, 11 Feb 2017 00:03:45 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 4CF1021D98 for ; Sat, 11 Feb 2017 00:03:43 +0000 (UTC) Date: Sat, 11 Feb 2017 00:03:43 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-6163) FS Preemption is a trickle for severely starved applications MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862041#comment-15862041 ] ASF GitHub Bot commented on YARN-6163: -------------------------------------- Github user templedf commented on a diff in the pull request: https://github.com/apache/hadoop/pull/192#discussion_r100640868 --- Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java --- @@ -1106,6 +1111,97 @@ boolean isStarvedForFairShare() { return !Resources.isNone(fairshareStarvation); } + /** + * Helper method for {@link #getStarvedResourceRequests()}: + * Given a map of visited {@link ResourceRequest}s, it checks if + * {@link ResourceRequest} 'rr' has already been visited. The map is updated + * to reflect visiting 'rr'. + */ + private static boolean checkAndMarkRRVisited( + Map> visitedRRs, ResourceRequest rr) { + Priority priority = rr.getPriority(); + Resource capability = rr.getCapability(); + if (visitedRRs.containsKey(priority)) { + List rrList = visitedRRs.get(priority); + if (rrList.contains(capability)) { + return true; + } else { + rrList.add(capability); + return false; + } + } else { + List newRRList = new ArrayList<>(); + newRRList.add(capability); + visitedRRs.put(priority, newRRList); + return false; + } + } + + /** + * Fetch a list of RRs corresponding to the extent the app is starved + * (fairshare and minshare). This method considers the number of containers + * in a RR and also only one locality-level (the first encountered + * resourceName). + * + * @return list of {@link ResourceRequest}s corresponding to the amount of + * starvation. + */ + List getStarvedResourceRequests() { + List ret = new ArrayList<>(); + Map> visitedRRs= new HashMap<>(); + + Resource pending = getStarvation(); + for (ResourceRequest rr : appSchedulingInfo.getAllResourceRequests()) { + if (Resources.isNone(pending)) { + break; + } + if (checkAndMarkRRVisited(visitedRRs, rr)) { + continue; + } + + // Compute the number of containers of this capability that fit in the + // pending amount + int ratio = (int) Math.floor( --- End diff -- This is super obtuse logic. Please document it thoroughly. > FS Preemption is a trickle for severely starved applications > ------------------------------------------------------------ > > Key: YARN-6163 > URL: https://issues.apache.org/jira/browse/YARN-6163 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler > Affects Versions: 2.9.0 > Reporter: Karthik Kambatla > Assignee: Karthik Kambatla > Attachments: yarn-6163-1.patch > > > With current logic, only one RR is considered per each instance of marking an application starved. This marking happens only on the update call that runs every 500ms. Due to this, an application that is severely starved takes forever to reach fairshare based on preemptions. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org