Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 0A463200C18 for ; Sat, 11 Feb 2017 19:09:47 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 08D26160B5D; Sat, 11 Feb 2017 18:09:47 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 2A0C0160B5B for ; Sat, 11 Feb 2017 19:09:46 +0100 (CET) Received: (qmail 54201 invoked by uid 500); 11 Feb 2017 18:09:45 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 54190 invoked by uid 99); 11 Feb 2017 18:09:45 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 11 Feb 2017 18:09:45 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id DBCF81A069E for ; Sat, 11 Feb 2017 18:09:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.198 X-Spam-Level: X-Spam-Status: No, score=-1.198 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id KCmYYVbUdrN7 for ; Sat, 11 Feb 2017 18:09:43 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id EF0685F5F8 for ; Sat, 11 Feb 2017 18:09:42 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 20E20E044B for ; Sat, 11 Feb 2017 18:09:42 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id B7CA021D67 for ; Sat, 11 Feb 2017 18:09:41 +0000 (UTC) Date: Sat, 11 Feb 2017 18:09:41 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-6163) FS Preemption is a trickle for severely starved applications MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sat, 11 Feb 2017 18:09:47 -0000 [ https://issues.apache.org/jira/browse/YARN-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862475#comment-15862475 ] ASF GitHub Bot commented on YARN-6163: -------------------------------------- Github user kambatla commented on a diff in the pull request: https://github.com/apache/hadoop/pull/192#discussion_r100673288 --- Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java --- @@ -1106,6 +1111,97 @@ boolean isStarvedForFairShare() { return !Resources.isNone(fairshareStarvation); } + /** + * Helper method for {@link #getStarvedResourceRequests()}: + * Given a map of visited {@link ResourceRequest}s, it checks if + * {@link ResourceRequest} 'rr' has already been visited. The map is updated + * to reflect visiting 'rr'. + */ + private static boolean checkAndMarkRRVisited( + Map> visitedRRs, ResourceRequest rr) { + Priority priority = rr.getPriority(); + Resource capability = rr.getCapability(); + if (visitedRRs.containsKey(priority)) { + List rrList = visitedRRs.get(priority); + if (rrList.contains(capability)) { + return true; + } else { + rrList.add(capability); + return false; + } + } else { + List newRRList = new ArrayList<>(); + newRRList.add(capability); + visitedRRs.put(priority, newRRList); + return false; + } + } + + /** + * Fetch a list of RRs corresponding to the extent the app is starved + * (fairshare and minshare). This method considers the number of containers + * in a RR and also only one locality-level (the first encountered + * resourceName). + * + * @return list of {@link ResourceRequest}s corresponding to the amount of + * starvation. + */ + List getStarvedResourceRequests() { + List ret = new ArrayList<>(); + Map> visitedRRs= new HashMap<>(); + + Resource pending = getStarvation(); + for (ResourceRequest rr : appSchedulingInfo.getAllResourceRequests()) { + if (Resources.isNone(pending)) { + break; + } + if (checkAndMarkRRVisited(visitedRRs, rr)) { + continue; + } + + // Compute the number of containers of this capability that fit in the + // pending amount + int ratio = (int) Math.floor( + Resources.ratio(scheduler.getResourceCalculator(), + pending, rr.getCapability())); + if (ratio == 0) { + continue; + } + + // If the RR is only partially being satisfied, include only the + // partial number of containers. + if (ratio < rr.getNumContainers()) { + rr = ResourceRequest.newInstance( + rr.getPriority(), rr.getResourceName(), rr.getCapability(), ratio); + } + ret.add(rr); + Resources.subtractFromNonNegative(pending, + Resources.multiply(rr.getCapability(), ratio)); + } + + return ret; + } + + /** + * Notify this app that preemption has been triggered to make room for + * outstanding demand. The app should not be considered starved until after + * the specified delay. + * + * @param delayBeforeNextStarvationCheck duration to wait + */ + void preemptionTriggered(long delayBeforeNextStarvationCheck) { + nextStarvationCheck = + scheduler.getClock().getTime() + delayBeforeNextStarvationCheck; + } + + /** + * Whether this app's starvation should be considered. + */ + boolean shouldCheckForStarvation() { + long now = scheduler.getClock().getTime(); + return now > nextStarvationCheck; --- End diff -- > and >= shouldn't really matter. Updated to >= > FS Preemption is a trickle for severely starved applications > ------------------------------------------------------------ > > Key: YARN-6163 > URL: https://issues.apache.org/jira/browse/YARN-6163 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler > Affects Versions: 2.9.0 > Reporter: Karthik Kambatla > Assignee: Karthik Kambatla > Attachments: yarn-6163-1.patch > > > With current logic, only one RR is considered per each instance of marking an application starved. This marking happens only on the update call that runs every 500ms. Due to this, an application that is severely starved takes forever to reach fairshare based on preemptions. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org