Return-Path: X-Original-To: apmail-incubator-mesos-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-mesos-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9E496E372 for ; Thu, 14 Feb 2013 01:10:31 +0000 (UTC) Received: (qmail 99033 invoked by uid 500); 14 Feb 2013 01:10:31 -0000 Delivered-To: apmail-incubator-mesos-dev-archive@incubator.apache.org Received: (qmail 99008 invoked by uid 500); 14 Feb 2013 01:10:31 -0000 Mailing-List: contact mesos-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mesos-dev@incubator.apache.org Delivered-To: mailing list mesos-dev@incubator.apache.org Received: (qmail 98996 invoked by uid 99); 14 Feb 2013 01:10:31 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Feb 2013 01:10:31 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id C4FA51C7555; Thu, 14 Feb 2013 01:10:23 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============7307917349213422760==" MIME-Version: 1.0 Subject: Re: Review Request: Resource Monitoring 8: Added cgroups::stat primitive and implemented cgroup resource collection. From: "Ben Mahler" To: "Benjamin Hindman" , "Vinod Kone" Cc: "Ben Mahler" , "David Mackey" , "mesos" Date: Thu, 14 Feb 2013 01:10:23 -0000 Message-ID: <20130214011023.21380.92169@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org Auto-Submitted: auto-generated Sender: "Ben Mahler" X-ReviewGroup: mesos X-ReviewRequest-URL: https://reviews.apache.org/r/9145/ X-Sender: "Ben Mahler" References: <20130214010329.21521.99775@reviews.apache.org> In-Reply-To: <20130214010329.21521.99775@reviews.apache.org> Reply-To: "Ben Mahler" --===============7307917349213422760== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable > On Feb. 14, 2013, 1:03 a.m., David Mackey wrote: > > Just a comment on what cpuacct.stat includes and doesn't include. > > = > > user is user time + nice time > > system is system time + irq time + softirq time > > = > > cpuacct.stat does not include idle, iowait, guest, guest nice and steal= time. IDLE and IOWAIT are essentially global stats that are ill defined in= a cgroup context and the others only really apply to virtual cpus. = > > = > > So, yes, looks good. = > > Thanks for the info! What about cpuacct.usage, and cpuacct.usage_percpu, do you know what those = include? Why would they be different? - Ben ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9145/#review16544 ----------------------------------------------------------- On Feb. 13, 2013, 9:48 p.m., Ben Mahler wrote: > = > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/9145/ > ----------------------------------------------------------- > = > (Updated Feb. 13, 2013, 9:48 p.m.) > = > = > Review request for mesos, Benjamin Hindman and Vinod Kone. > = > = > Description > ------- > = > This implements resource collection for the cgroups isolation module. > = > From the redhat documentation: > https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6= /html/Resource_Management_Guide/sec-cpuacct.html > = > // cpuacct.usage > // reports the total CPU time (in nanoseconds) consumed by all tasks in t= his cgroup (including tasks lower in the hierarchy). > I don't like this control because it can be reset back to zero! > = > // cpuacct.stat > // reports the user and system CPU time consumed by all tasks in this cgr= oup (including tasks lower in the hierarchy) in the following way: > // user =E2=80=94 CPU time consumed by tasks in user mode. > // system =E2=80=94 CPU time consumed by tasks in system (kernel) mode. > // CPU time is reported in the units defined by the USER_HZ variable. > Since USER_HZ is typically 100, the granularity here is only 10 ms. > = > // cpuacct.usage_percpu > // reports the CPU time (in nanoseconds) consumed on each CPU by all task= s in this cgroup (including tasks lower in the hierarchy). > I don't like this control because it can be reset back to zero! > = > I've used cpuacct.stat since AFAICT it can't be reset to 0. > However cpuacct.stat has somewhat low granularity, see the testing commen= ts below. > = > = > This addresses bug MESOS-324. > https://issues.apache.org/jira/browse/MESOS-324 > = > = > Diffs > ----- > = > src/linux/cgroups.hpp 1f701f3bbbe06ddf84768c68b529aba4659c19be = > src/linux/cgroups.cpp 03b31e7309b9dd65f00d3b0da2abb81ddaaeea43 = > src/slave/cgroups_isolation_module.cpp 63cefc33cf34eebb82db5d8448b751be= 8652fa36 = > src/tests/cgroups_tests.cpp b219906374764e91f1a5268469ae92dd0fe08e53 = > = > Diff: https://reviews.apache.org/r/9145/diff/ > = > = > Testing > ------- > = > Added tests for cgroups::stat. > = > End to end testing using the webui. > = > NOTES for cpuacct.stat: > $ cat /cgroup/mesos/framework_201302132039-2081170186-5050-60933-0001_exe= cutor_default_tag_3e1f5310-c873-42cb-9aa4-4ee4c2b9feb8/cpuacct.usage > 4672471833 > --> 4672471833ns =3D 4.67 seconds > = > $ cat /cgroup/mesos/framework_201302132039-2081170186-5050-60933-0001_exe= cutor_default_tag_3e1f5310-c873-42cb-9aa4-4ee4c2b9feb8/cpuacct.usage_percpu = > 831220060 463800214 319016010 184325849 840595741 441855678 294660045 160= 799890 240361561 197829862 130045719 56978804 227972655 193743493 98604097 = 70557562 = > --> 831220060+463800214+319016010+184325849+840595741+441855678+294660045= +160799890+240361561+197829862+130045719+56978804+227972655+193743493+98604= 097+70557562 =3D 4752367240ns =3D 4.75 seconds > = > $ cat /cgroup/mesos/framework_201302132039-2081170186-5050-60933-0001_exe= cutor_default_tag_3e1f5310-c873-42cb-9aa4-4ee4c2b9feb8/cpuacct.stat > user 111 > system 246 > --> 1.11 + 2.46 =3D 3.57 seconds > = > So since cpuacct.stat reveals only the user + system times, we see slight= ly lower times than where the total time is displayed. I'm guessing they ma= y be including other cpu times? > E.g. steal, guest > = > I think user + system is a good measurement. > = > = > Thanks, > = > Ben Mahler > = > --===============7307917349213422760==--