mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From " (Commented) (JIRA)" <>
Subject [jira] [Commented] (MESOS-38) Executor resource monitoring and local reporting of usage stats
Date Wed, 07 Mar 2012 20:38:58 GMT

] commented on MESOS-38:

bq.  On 2012-03-05 23:04:17, Charles Reiss wrote:
bq.  > src/slave/resource_monitor.cpp, line 67
bq.  > <>
bq.  >
bq.  >     I assume that you've discussed calling this 'mem_usage' and 'cpu_usage' rather
than 'mem' and 'cpus' with Ben? Can you explain the reasoning briefly?
bq.  Sam Whitlock wrote:
bq.      I sorta made this choice unilaterally based on its similarity to the naming for cgroups
bq.      I don't really care what it is called and would certainly be willing to change it
if it would make it more similar to the rest of mesos.
bq.  Sam Whitlock wrote:
bq.      I guess the choice to go with *_usage is because mem and cpus is used elsewhere in
mesos for things that are not reported usage, and I wanted to draw a distinction.

I was assuming that usage measurements would always be kept in separate fields from requests.
Is there a usecase where this doesn't look like this will be the case?

Unless there's a reason to believe it would be deceptive, I'd prefer to name the usage measurements
the same as the requested resource they are supposed to measure.

- Charles

This is an automatically generated e-mail. To reply, visit:

On 2012-03-06 04:54:32, Sam Whitlock wrote:
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  -----------------------------------------------------------
bq.  (Updated 2012-03-06 04:54:32)
bq.  Review request for mesos, Benjamin Hindman and Charles Reiss.
bq.  Summary
bq.  -------
bq.  This mega-patch is intended to represent the partial completion of the slave monitoring
functionality. It is not intended to be committed. Changes based on comments in this review
will be reflected in future reviews that are smaller and more modular.
bq.  Proc utils is included in this patch, but is already under review here:
bq.  The relevant design doc can be found here:
bq.  The following items are ones where specific feedback is requested:
bq.  * A better mechanism is needed to control the rate at which the slave asks each executor
for its UsageMessage. This is currently hard-coded to be at 1 second intervals, but could
potentially be read as a command-line option or from a config file. Is there a better or different
way to pass in this value?
bq.  * Currently, UsageMessages are passed from a ResourceMonitor to the Slave using the Future
construct, and used as containers that hold a snapshot of the latest usage. This is to prevent
unnecessary marshalling and extra data structures, since messages will eventually be sent
in the standard dispatch style from the slave to the master. Is it fine that we are using
Protobuf messages in this way?
bq.  There are several changes that are not yet implemented in this patch. These changes are
as follows:
bq.  * Sufficient tests cases have not yet been written for any component (resource monitor,
lxc collector, and process collector).
bq.  * Code has not been cleaned up to adhere to all style recommendations.
bq.  * Process collector code needs to be updated to prevent CPU usage spikes when monitored
sub-processes die.
bq.  * Code to send UsageMessages from the slave to the master.
bq.  This addresses bug MESOS-38.
bq.  Diffs
bq.  -----
bq.    src/ 1137a3e 
bq.    src/master/allocator.hpp 1ac435b 
bq.    src/master/http.cpp 591433a 
bq.    src/master/master.hpp 53551b0 
bq.    src/master/master.cpp 1d3961e 
bq.    src/messages/messages.proto 11a2c41 
bq.    src/monitoring/linux/lxc_resource_collector.hpp PRE-CREATION 
bq.    src/monitoring/linux/lxc_resource_collector.cpp PRE-CREATION 
bq.    src/monitoring/linux/proc_resource_collector.hpp PRE-CREATION 
bq.    src/monitoring/linux/proc_resource_collector.cpp PRE-CREATION 
bq.    src/monitoring/linux/proc_utils.hpp PRE-CREATION 
bq.    src/monitoring/linux/proc_utils.cpp PRE-CREATION 
bq.    src/monitoring/process_resource_collector.hpp PRE-CREATION 
bq.    src/monitoring/process_resource_collector.cpp PRE-CREATION 
bq.    src/monitoring/process_stats.hpp PRE-CREATION 
bq.    src/monitoring/resource_collector.hpp PRE-CREATION 
bq.    src/slave/http.cpp f03815d 
bq.    src/slave/isolation_module.hpp c896908 
bq.    src/slave/isolation_module.cpp 5b7b4a2 
bq.    src/slave/lxc_isolation_module.hpp b7beefe 
bq.    src/slave/lxc_isolation_module.cpp d544625 
bq.    src/slave/main.cpp ac780c4 
bq.    src/slave/process_based_isolation_module.hpp f6f9554 
bq.    src/slave/process_based_isolation_module.cpp 100b1e3 
bq.    src/slave/resource_monitor.hpp PRE-CREATION 
bq.    src/slave/resource_monitor.cpp PRE-CREATION 
bq.    src/slave/slave.hpp b1a07e9 
bq.    src/slave/slave.cpp ce8fda5 
bq.    src/tests/ 6f51be4 
bq.    src/tests/proc_utils_tests.cpp PRE-CREATION 
bq.    src/tests/process_resource_collector_tests.cpp PRE-CREATION 
bq.    src/tests/resource_monitor_tests.cpp PRE-CREATION 
bq.  Diff:
bq.  Testing
bq.  -------
bq.  Test cases:
bq.  * A test case exercising the basic monitoring code with a mocked-out collector.
bq.  * The first of several tests for the process resource monitor, with the proc-based collecting
mocked out.
bq.  Some ad-hoc testing with log statements to ensure that the monitoring works end-to-end
from both the container-based and process-based isolation modules.
bq.  Thanks,
bq.  Sam

> Executor resource monitoring and local reporting of usage stats
> ---------------------------------------------------------------
>                 Key: MESOS-38
>                 URL:
>             Project: Mesos
>          Issue Type: New Feature
>          Components: isolation, slave
>         Environment: Initial executor monitoring for linux only. Dummy monitoring capability
(no-op) for OSX, with functionality to be filled in later.
>            Reporter: Sam Whitlock
>            Assignee: Sam Whitlock
>              Labels: monitoring
> Implement reporting of resource usage on executors and log them to a local log file (for
now). The eventual usage of this will be to report these statistics to the Mesos master in
order to build either or both a timeline for the webui and/or a top-like command-line interface.
This improvement ticket is just for the local monitoring and log file reporting. A reporting
system (to the master node) will be a later improvement ticket.
> With the current version of Mesos, it is not possible to monitor individual tasks. Therefore
the best this sort of system can do is monitor the usage of an individual executor and aggregate
the resource usage of over the executor's tasks and resource allocations. If frameworks have
a 1-to-1 relationship of a job to an executor, then the aggregate statistics will be more
> Reporting will be available for both lxc isolation and process-based isolation. For lxc
isolation the task is easier because of the isolation facilities of lxc. Process-based isolation
is more difficult as processes can become re-parented from the process tree of the executor
(e.g. double fork). The session ID and the process group ID will likely still be the same
as that of the executor except for the uncommon case of the process resetting both of those.
> When usage statistics are eventually reported to the Mesos master, it may be possible
to use them to oversubscribe slave nodes.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message