Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8E9E0100A6 for ; Thu, 25 Jul 2013 18:01:57 +0000 (UTC) Received: (qmail 38410 invoked by uid 500); 25 Jul 2013 18:01:56 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 38323 invoked by uid 500); 25 Jul 2013 18:01:56 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 38263 invoked by uid 99); 25 Jul 2013 18:01:55 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Jul 2013 18:01:55 +0000 Date: Thu, 25 Jul 2013 18:01:55 +0000 (UTC) From: "Sandy Ryza (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-972) Allow requests and scheduling for fractional virtual cores MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719844#comment-13719844 ] Sandy Ryza commented on YARN-972: --------------------------------- [~t.st.clair], when you say the "real scheduler" you mean the OS scheduler? How does it burden it? MapReduce has long been using multiple tasks per core, which makes sense because, as you say, often CPU is not the bottleneck. To make sure we're on the same page, we are not talking about pinning tasks to cores. I agree that we should add in network and I/O bandwidth as resources as well, but would doing so solve the need for finer-grained CPU-requests? > Allow requests and scheduling for fractional virtual cores > ---------------------------------------------------------- > > Key: YARN-972 > URL: https://issues.apache.org/jira/browse/YARN-972 > Project: Hadoop YARN > Issue Type: Improvement > Components: api, scheduler > Affects Versions: 2.0.5-alpha > Reporter: Sandy Ryza > Assignee: Sandy Ryza > > As this idea sparked a fair amount of discussion on YARN-2, I'd like to go deeper into the reasoning. > Currently the virtual core abstraction hides two orthogonal goals. The first is that a cluster might have heterogeneous hardware and that the processing power of different makes of cores can vary wildly. The second is that a different (combinations of) workloads can require different levels of granularity. E.g. one admin might want every task on their cluster to use at least a core, while another might want applications to be able to request quarters of cores. The former would configure a single vcore per core. The latter would configure four vcores per core. > I don't think that the abstraction is a good way of handling the second goal. Having a virtual cores refer to different magnitudes of processing power on different clusters will make the difficult problem of deciding how many cores to request for a job even more confusing. > Can we not handle this with dynamic oversubscription? > Dynamic oversubscription, i.e. adjusting the number of cores offered by a machine based on measured CPU-consumption, should work as a complement to fine-granularity scheduling. Dynamic oversubscription is never going to be perfect, as the amount of CPU a process consumes can vary widely over its lifetime. A task that first loads a bunch of data over the network and then performs complex computations on it will suffer if additional CPU-heavy tasks are scheduled on the same node because its initial CPU-utilization was low. To guard against this, we will need to be conservative with how we dynamically oversubscribe. If a user wants to explicitly hint to the scheduler that their task will not use much CPU, the scheduler should be able to take this into account. > On YARN-2, there are concerns that including floating point arithmetic in the scheduler will slow it down. I question this assumption, and it is perhaps worth debating, but I think we can sidestep the issue by multiplying CPU-quantities inside the scheduler by a decently sized number like 1000 and keep doing the computations on integers. > The relevant APIs are marked as evolving, so there's no need for the change to delay 2.1.0-beta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira