incubator-mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sam Whitlock" <>
Subject Re: Review Request: (work in progress!) adding functionality to monitoring resource usage for each executor on the slave
Date Sun, 04 Mar 2012 20:40:08 GMT

This is an automatically generated e-mail. To reply, visit:

(Updated 2012-03-04 20:40:08.696183)

Review request for mesos, Benjamin Hindman and Charles Reiss.


Diff from parent:
- reduced logging level for failing to read info from container to INFO (from ERROR), because
it is not necessarily an error
- added command-line arg -f for frequency of usage info scraping (1/f is the time passed to
the delay call)


This mega-patch is intended to represent the partial completion of the slave monitoring functionality.
It is not intended to be committed. Changes based on comments in this review will be reflected
in future reviews that are smaller and more modular.

Proc utils is included in this patch, but is already under review here:

The relevant design doc can be found here:

The following items are ones where specific feedback is requested:

* A better mechanism is needed to control the rate at which the slave asks each executor for
its UsageMessage. This is currently hard-coded to be at 1 second intervals, but could potentially
be read as a command-line option or from a config file. Is there a better or different way
to pass in this value?
* Currently, UsageMessages are passed from a ResourceMonitor to the Slave using the Future
construct, and used as containers that hold a snapshot of the latest usage. This is to prevent
unnecessary marshalling and extra data structures, since messages will eventually be sent
in the standard dispatch style from the slave to the master. Is it fine that we are using
Protobuf messages in this way?

There are several changes that are not yet implemented in this patch. These changes are as

* Sufficient tests cases have not yet been written for any component (resource monitor, lxc
collector, and process collector).
* Code has not been cleaned up to adhere to all style recommendations.
* Process collector code needs to be updated to prevent CPU usage spikes when monitored sub-processes
* Code to send UsageMessages from the slave to the master.

This addresses bug MESOS-38.

Diffs (updated)

  src/monitoring/process_stats.hpp PRE-CREATION 
  src/monitoring/resource_collector.hpp PRE-CREATION 
  src/slave/http.cpp f03815d 
  src/slave/isolation_module.hpp c896908 
  src/slave/isolation_module.cpp 5b7b4a2 
  src/slave/lxc_isolation_module.hpp b7beefe 
  src/slave/lxc_isolation_module.cpp d544625 
  src/slave/main.cpp ac780c4 
  src/slave/process_based_isolation_module.hpp f6f9554 
  src/slave/process_based_isolation_module.cpp 100b1e3 
  src/slave/resource_monitor.hpp PRE-CREATION 
  src/slave/resource_monitor.cpp PRE-CREATION 
  src/slave/slave.hpp b1a07e9 
  src/slave/slave.cpp ce8fda5 
  src/tests/ 6f51be4 
  src/tests/proc_utils_tests.cpp PRE-CREATION 
  src/tests/process_resource_collector_tests.cpp PRE-CREATION 
  src/tests/resource_monitor_tests.cpp PRE-CREATION 
  src/monitoring/process_resource_collector.cpp PRE-CREATION 
  src/monitoring/process_resource_collector.hpp PRE-CREATION 
  src/monitoring/linux/proc_utils.cpp PRE-CREATION 
  src/monitoring/linux/proc_utils.hpp PRE-CREATION 
  src/monitoring/linux/proc_resource_collector.cpp PRE-CREATION 
  src/monitoring/linux/proc_resource_collector.hpp PRE-CREATION 
  src/monitoring/linux/lxc_resource_collector.cpp PRE-CREATION 
  src/master/master.hpp 53551b0 
  src/master/master.cpp 1d3961e 
  src/messages/messages.proto 11a2c41 
  src/monitoring/linux/lxc_resource_collector.hpp PRE-CREATION 
  src/master/allocator.hpp 1ac435b 
  src/master/http.cpp 591433a 
  src/ 1137a3e 



Test cases:
* A test case exercising the basic monitoring code with a mocked-out collector.
* The first of several tests for the process resource monitor, with the proc-based collecting
mocked out.

Some ad-hoc testing with log statements to ensure that the monitoring works end-to-end from
both the container-based and process-based isolation modules.



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message