Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 20B0C115D0 for ; Fri, 18 Jul 2014 21:54:14 +0000 (UTC) Received: (qmail 27143 invoked by uid 500); 18 Jul 2014 21:54:13 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 27099 invoked by uid 500); 18 Jul 2014 21:54:13 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 27087 invoked by uid 99); 18 Jul 2014 21:54:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Jul 2014 21:54:13 +0000 Date: Fri, 18 Jul 2014 21:54:13 +0000 (UTC) From: "Wei Yan (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (YARN-810) Support CGroup ceiling enforcement on CPU MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-810: ------------------------- Attachment: YARN-810.patch Upload a patch for review. (1) Add a configuration field cpu_enforce_ceiling_enabled to the ApplicationSubmissionContext. Each application can set this field to true (default is false) if it wants cpu ceiling enforcement. (2) RM will notify the list of containers with cpu_enforce_ceiling_enabled with NM through heartbeat. The heartbeat responsem message contains a list of containerIds which are launched at current node and with ceiling enabled. (3) The CgroupsLCEResource will set the cpu.cfs_period_us and cpu.cfs_quota_us for containers with ceiling enabled. (4) Update the distributed shell example to include the cpu_enforce_ceiling_enabled configuration, so we can test this feature using distributedshell. > Support CGroup ceiling enforcement on CPU > ----------------------------------------- > > Key: YARN-810 > URL: https://issues.apache.org/jira/browse/YARN-810 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 2.1.0-beta, 2.0.5-alpha > Reporter: Chris Riccomini > Assignee: Sandy Ryza > Attachments: YARN-810.patch > > > Problem statement: > YARN currently lets you define an NM's pcore count, and a pcore:vcore ratio. Containers are then allowed to request vcores between the minimum and maximum defined in the yarn-site.xml. > In the case where a single-threaded container requests 1 vcore, with a pcore:vcore ratio of 1:4, the container is still allowed to use up to 100% of the core it's using, provided that no other container is also using it. This happens, even though the only guarantee that YARN/CGroups is making is that the container will get "at least" 1/4th of the core. > If a second container then comes along, the second container can take resources from the first, provided that the first container is still getting at least its fair share (1/4th). > There are certain cases where this is desirable. There are also certain cases where it might be desirable to have a hard limit on CPU usage, and not allow the process to go above the specified resource requirement, even if it's available. > Here's an RFC that describes the problem in more detail: > http://lwn.net/Articles/336127/ > Solution: > As it happens, when CFS is used in combination with CGroups, you can enforce a ceiling using two files in cgroups: > {noformat} > cpu.cfs_quota_us > cpu.cfs_period_us > {noformat} > The usage of these two files is documented in more detail here: > https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html > Testing: > I have tested YARN CGroups using the 2.0.5-alpha implementation. By default, it behaves as described above (it is a soft cap, and allows containers to use more than they asked for). I then tested CFS CPU quotas manually with YARN. > First, you can see that CFS is in use in the CGroup, based on the file names: > {noformat} > [criccomi@eat1-qa464 ~]$ sudo -u app ls -l /cgroup/cpu/hadoop-yarn/ > total 0 > -r--r--r-- 1 app app 0 Jun 13 16:46 cgroup.procs > drwxr-xr-x 2 app app 0 Jun 13 17:08 container_1371141151815_0004_01_000002 > -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_period_us > -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_quota_us > -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_period_us > -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_runtime_us > -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.shares > -r--r--r-- 1 app app 0 Jun 13 16:46 cpu.stat > -rw-r--r-- 1 app app 0 Jun 13 16:46 notify_on_release > -rw-r--r-- 1 app app 0 Jun 13 16:46 tasks > [criccomi@eat1-qa464 ~]$ sudo -u app cat > /cgroup/cpu/hadoop-yarn/cpu.cfs_period_us > 100000 > [criccomi@eat1-qa464 ~]$ sudo -u app cat > /cgroup/cpu/hadoop-yarn/cpu.cfs_quota_us > -1 > {noformat} > Oddly, it appears that the cfs_period_us is set to .1s, not 1s. > We can place processes in hard limits. I have process 4370 running YARN container container_1371141151815_0003_01_000003 on a host. By default, it's running at ~300% cpu usage. > {noformat} > CPU > 4370 criccomi 20 0 1157m 551m 14m S 240.3 0.8 87:10.91 ... > {noformat} > When I set the CFS quote: > {noformat} > echo 1000 > /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_000003/cpu.cfs_quota_us > CPU > 4370 criccomi 20 0 1157m 563m 14m S 1.0 0.8 90:08.39 ... > {noformat} > It drops to 1% usage, and you can see the box has room to spare: > {noformat} > Cpu(s): 2.4%us, 1.0%sy, 0.0%ni, 92.2%id, 4.2%wa, 0.0%hi, 0.1%si, 0.0%st > {noformat} > Turning the quota back to -1: > {noformat} > echo -1 > /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_000003/cpu.cfs_quota_us > {noformat} > Burns the cores again: > {noformat} > Cpu(s): 11.1%us, 1.7%sy, 0.0%ni, 83.9%id, 3.1%wa, 0.0%hi, 0.2%si, 0.0%st > CPU > 4370 criccomi 20 0 1157m 563m 14m S 253.9 0.8 89:32.31 ... > {noformat} > On my dev box, I was testing CGroups by running a python process eight times, to burn through all the cores, since it was doing as described above (giving extra CPU to the process, even with a cpu.shares limit). Toggling the cfs_quota_us seems to enforce a hard limit. > Implementation: > What do you guys think about introducing a variable to YarnConfiguration: > bq. yarn.nodemanager.linux-container.executor.cgroups.cpu-ceiling-enforcement > The default would be false. Setting to true, would cause YARN's LCE to set: > {noformat} > cpu.cfs_quota_us=(container-request-vcores/nm-vcore-to-pcore-ratio) * 1000000 > cpu.cfs_period_us=1000000 > {noformat} > For example, if a container asks for 2 vcores, and the vcore:pcore ratio is 4, you'd get: > {noformat} > cpu.cfs_quota_us=(2/4) * 1000000 = 500000 > cpu.cfs_period_us=1000000 > {noformat} > This would cause CFS to cap the process at 50% of clock cycles. > What do you guys think? > 1. Does this seem like a reasonable request? We have some use-cases for it. > 2. It's unclear to me how cpu.shares interacts with cpu.cfs_*. I think the ceiling is hard, no matter what shares is set to. I assume shares only comes into play if the CFS quota has not been reached, and the process begins competing with others for CPU resources. > 3. Should this be an LCE config (yarn.nodemanager.linux-container-executor), or should it be a generic scheduler config (yarn.scheduler.enforce-ceiling-vcores). -- This message was sent by Atlassian JIRA (v6.2#6252)