Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 8C769200C40 for ; Thu, 23 Mar 2017 19:35:47 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 8A9C6160B84; Thu, 23 Mar 2017 18:35:47 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id CFD24160B68 for ; Thu, 23 Mar 2017 19:35:46 +0100 (CET) Received: (qmail 92546 invoked by uid 500); 23 Mar 2017 18:35:45 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 92523 invoked by uid 99); 23 Mar 2017 18:35:44 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Mar 2017 18:35:44 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 39385C27BD for ; Thu, 23 Mar 2017 18:35:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -98.549 X-Spam-Level: X-Spam-Status: No, score=-98.549 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_NEUTRAL=0.652, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id GaeGmfkCgyEQ for ; Thu, 23 Mar 2017 18:35:43 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 3165C5FDC9 for ; Thu, 23 Mar 2017 18:35:43 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 432DAE06C6 for ; Thu, 23 Mar 2017 18:35:42 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id BC4D221DC4 for ; Thu, 23 Mar 2017 18:35:41 +0000 (UTC) Date: Thu, 23 Mar 2017 18:35:41 +0000 (UTC) From: "Konstantinos Karanasos (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-6344) Rethinking OFF_SWITCH locality in CapacityScheduler MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 23 Mar 2017 18:35:47 -0000 [ https://issues.apache.org/jira/browse/YARN-6344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15938973#comment-15938973 ] Konstantinos Karanasos commented on YARN-6344: ---------------------------------------------- Thank you for the input, [~sunilg] and [~leftnoteasy]. [~sunilg], indeed we could eventually give the parameter as a percentage of the cluster nodes, e.g., 0.25 would mean that we wait to hear from 25% of the cluster nodes before relaxing locality. But we would then have to change the node-locality-delay to be a percentage too, in order to be consistent. And this would mean that everybody would have to change its cluster configuration. So, like Wangda says, maybe it is better to keep it as an absolute value for now? [~leftnoteasy], I like your idea to put the rack-locality-delay as "additional opportunities on top of node-locality-delay ones". I will update the patch. > Rethinking OFF_SWITCH locality in CapacityScheduler > --------------------------------------------------- > > Key: YARN-6344 > URL: https://issues.apache.org/jira/browse/YARN-6344 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler > Reporter: Konstantinos Karanasos > Assignee: Konstantinos Karanasos > Attachments: YARN-6344.001.patch > > > When relaxing locality from node to rack, the {{node-locality-parameter}} is used: when scheduling opportunities for a scheduler key are more than the value of this parameter, we relax locality and try to assign the container to a node in the corresponding rack. > On the other hand, when relaxing locality to off-switch (i.e., assign the container anywhere in the cluster), we are using a {{localityWaitFactor}}, which is computed based on the number of outstanding requests for a specific scheduler key, which is divided by the size of the cluster. > In case of applications that request containers in big batches (e.g., traditional MR jobs), and for relatively small clusters, the localityWaitFactor does not affect relaxing locality much. > However, in case of applications that request containers in small batches, this load factor takes a very small value, which leads to assigning off-switch containers too soon. This situation is even more pronounced in big clusters. > For example, if an application requests only one container per request, the locality will be relaxed after a single missed scheduling opportunity. > The purpose of this JIRA is to rethink the way we are relaxing locality for off-switch assignments. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org