Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id AF997200CC1 for ; Mon, 10 Jul 2017 20:36:40 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id ADFE6163ECF; Mon, 10 Jul 2017 18:36:40 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id CB6D8163EC3 for ; Mon, 10 Jul 2017 20:36:39 +0200 (CEST) Received: (qmail 94955 invoked by uid 500); 10 Jul 2017 18:36:39 -0000 Mailing-List: contact reviews-help@aurora.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: reviews@aurora.apache.org Delivered-To: mailing list reviews@aurora.apache.org Received: (qmail 94944 invoked by uid 99); 10 Jul 2017 18:36:38 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Jul 2017 18:36:38 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 59819193EB8; Mon, 10 Jul 2017 18:36:38 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.249 X-Spam-Level: *** X-Spam-Status: No, score=3.249 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, KAM_LAZY_DOMAIN_SECURITY=1, KAM_LOTSOFHASH=0.25, RP_MATCHES_RCVD=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id FxAvINTEn-zg; Mon, 10 Jul 2017 18:36:35 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 12C35623F4; Mon, 10 Jul 2017 18:36:35 +0000 (UTC) Received: from reviews.apache.org (unknown [10.41.0.12]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 6A356E01D8; Mon, 10 Jul 2017 18:36:34 +0000 (UTC) Received: from reviews-vm2.apache.org (localhost [IPv6:::1]) by reviews.apache.org (ASF Mail Server at reviews-vm2.apache.org) with ESMTP id 009BCC400F5; Mon, 10 Jul 2017 18:36:33 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============6437925276405573842==" MIME-Version: 1.0 Subject: Re: Review Request 60748: Prototype using cgroups for monitoring Thermos Process resource consumption (CPU and memory) From: Aurora ReviewBot To: Santhosh Kumar Shanmugham , David McLaughlin , Stephan Erb , Zameer Manji Cc: Aurora ReviewBot , Reza Motamedi , Aurora Date: Mon, 10 Jul 2017 18:36:33 -0000 Message-ID: <20170710183633.6126.97985@reviews-vm2.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: Aurora ReviewBot X-ReviewGroup: Aurora X-Auto-Response-Suppress: DR, RN, OOF, AutoReply X-ReviewRequest-URL: https://reviews.apache.org/r/60748/ X-Sender: Aurora ReviewBot References: <20170710183002.33458.23050@reviews-vm2.apache.org> In-Reply-To: <20170710183002.33458.23050@reviews-vm2.apache.org> X-ReviewBoard-Diff-For: src/main/python/apache/thermos/core/cgroup.py X-ReviewBoard-Diff-For: src/main/python/apache/thermos/monitoring/process_collector_cgroup.py Reply-To: Aurora ReviewBot X-ReviewRequest-Repository: aurora archived-at: Mon, 10 Jul 2017 18:36:40 -0000 --===============6437925276405573842== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/60748/#review180080 ----------------------------------------------------------- Master (a922b05) is red with this patch. ./build-support/jenkins/build.sh @@ -29,7 +29,6 @@ import subprocess import sys import time - from abc import abstractmethod from copy import deepcopy ERROR: /home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/python/apache/thermos/monitoring/process_collector_cgroup.py Imports are incorrectly sorted. --- /home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/python/apache/thermos/monitoring/process_collector_cgroup.py:before 2017-07-10 18:31:05.912538 +++ /home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/python/apache/thermos/monitoring/process_collector_cgroup.py:after 2017-07-10 18:36:31.156897 @@ -14,13 +14,14 @@ """ Sample resource consumption statistics for processes using psutil """ +import traceback from operator import attrgetter from time import time -import traceback from twitter.common import log from apache.thermos.core.cgroup import ControlGroupHelper + from .process import ProcessSample ERROR: /home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/python/apache/thermos/monitoring/resource.py Imports are incorrectly sorted. --- /home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/python/apache/thermos/monitoring/resource.py:before 2017-07-10 18:31:05.912538 +++ /home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/python/apache/thermos/monitoring/resource.py:after 2017-07-10 18:36:31.171885 @@ -43,8 +43,8 @@ from .disk import DiskCollector from .process import ProcessSample +from .process_collector_cgroup import ProcessCollector from .process_collector_psutil import ProcessTreeCollector -from .process_collector_cgroup import ProcessCollector class ResourceMonitorBase(Interface): I will refresh this build result if you post a review containing "@ReviewBot retry" - Aurora ReviewBot On July 10, 2017, 6:30 p.m., Reza Motamedi wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/60748/ > ----------------------------------------------------------- > > (Updated July 10, 2017, 6:30 p.m.) > > > Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, Stephan Erb, and Zameer Manji. > > > Repository: aurora > > > Description > ------- > > # Prototype using cgroups for monitoring Thermos Process resource consumption (CPU and memory) > The idea behind this prototype is to use kernel cgroups instead of per pid monitoring of Thermos Tasks and Processes. > This [document](https://docs.google.com/a/twitter.com/document/d/16JFIqY2ftvNNXxYf6jQwO6EXPajCKp7kPJRAQSsaPko/edit?usp=sharing) describes more about the problem that this prototype tries to solve. > > __Note:__ Since I am piggybacking on the cgroup clean-up implemented in Mesos, if Mesos's memory and CPU isolation are not enabled, I will not create cgroups and will simply revert to using old monitoring scheme. > > # Notes on Performance: > > I used `top -p -bc -n 10 | grep 'python'` to monitor the cpu usage of thermos on my vagrant. I had 7 Tasks each with 3 Processes. > > Stock Thermos Observer > ``` > 21641 root 20 0 1351200 44448 4088 S 6.6 1.4 0:35.69 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=NONE --log_to_stderr=google:INFO > 21641 root 20 0 1351200 44448 4088 S 2.7 1.4 0:35.77 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=NONE --log_to_stderr=google:INFO > 21641 root 20 0 1351200 44448 4088 S 3.3 1.4 0:35.87 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=NONE --log_to_stderr=google:INFO > 21641 root 20 0 1351200 44448 4088 S 2.3 1.4 0:35.94 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=NONE --log_to_stderr=google:INFO > 21641 root 20 0 1351200 44448 4088 S 4.3 1.4 0:36.07 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=NONE --log_to_stderr=google:INFO > 21641 root 20 0 1351200 44448 4088 S 3.6 1.4 0:36.18 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=NONE --log_to_stderr=google:INFO > 21641 root 20 0 1351204 44616 4088 S 11.6 1.4 0:36.53 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=NONE --log_to_stderr=google:INFO > 21641 root 20 0 1351200 44552 4088 S 39.6 1.4 0:37.72 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=NONE --log_to_stderr=google:INFO > 21641 root 20 0 1351200 44552 4088 S 2.7 1.4 0:37.80 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=NONE --log_to_stderr=google:INFO > 21641 root 20 0 1351200 44552 4088 S 7.6 1.4 0:38.03 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=NONE --log_to_stderr=google:INFO > ``` > > Thermos Observer using CGROUP monitoring > ``` > 15203 root 20 0 1367828 45344 4088 S 6.6 1.5 0:55.37 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=DEBUG --log_to_stderr=google:INFO > 15203 root 20 0 1367828 45344 4088 S 2.0 1.5 0:55.43 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=DEBUG --log_to_stderr=google:INFO > 15203 root 20 0 1351436 45308 4088 S 4.3 1.5 0:55.56 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=DEBUG --log_to_stderr=google:INFO > 15203 root 20 0 1351436 45308 4088 S 2.3 1.5 0:55.63 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=DEBUG --log_to_stderr=google:INFO > 15203 root 20 0 1351436 45308 4088 S 2.0 1.5 0:55.69 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=DEBUG --log_to_stderr=google:INFO > 15203 root 20 0 1351436 45308 4088 S 3.3 1.5 0:55.79 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=DEBUG --log_to_stderr=google:INFO > 15203 root 20 0 1351436 45308 4088 S 2.3 1.5 0:55.86 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=DEBUG --log_to_stderr=google:INFO > 15203 root 20 0 1351436 45308 4088 S 1.0 1.5 0:55.89 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=DEBUG --log_to_stderr=google:INFO > 15203 root 20 0 1351436 45308 4088 S 2.3 1.5 0:55.96 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=DEBUG --log_to_stderr=google:INFO > 15203 root 20 0 1351436 45308 4088 S 3.3 1.5 0:56.06 python2.7 /home/vagrant/aurora/dist/thermos_observer.pex --ip=192.168.33.7 --port=1338 --log_to_disk=DEBUG --log_to_stderr=google:INFO > ``` > > > Diffs > ----- > > examples/vagrant/mesos_config/etc_mesos-slave/isolation 1a7028ffc70116b104ef3ad22b7388f637707a0f > src/main/python/apache/aurora/executor/thermos_task_runner.py 8f88af4c24ddc603fa12587741af56a6c711e420 > src/main/python/apache/thermos/core/cgroup.py PRE-CREATION > src/main/python/apache/thermos/core/process.py 4a4678ff39c84cb87836aca19365c5b2aabc4fa4 > src/main/python/apache/thermos/monitoring/process_collector_cgroup.py PRE-CREATION > src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 > src/main/python/apache/thermos/observer/http/templates/task.tpl f3e06985eb3c05572aa4389d97da575b1179f616 > > > Diff: https://reviews.apache.org/r/60748/diff/1/ > > > Testing > ------- > > This patch is mostly a prototype. Note that I had to enable Mesos's cpu and memory isolation. > > Current tests pass. I first want to see how the community feels generally about this approach, and then I will add additional tests. > > > Thanks, > > Reza Motamedi > > --===============6437925276405573842==--