mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Robinson (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MESOS-1028) expose internal metrics
Date Fri, 21 Feb 2014 19:27:22 GMT
David Robinson created MESOS-1028:
-------------------------------------

             Summary: expose internal metrics
                 Key: MESOS-1028
                 URL: https://issues.apache.org/jira/browse/MESOS-1028
             Project: Mesos
          Issue Type: Improvement
          Components: general
            Reporter: David Robinson


Mesos should export statistics that provide visibility into its internals. This would allow
users to detect numerous problem without resorting to trolling log files.

E.g. export counters of (some of these already exist, most don't):
cgroup create
cgroup destroy
cgroup destroy attempts
resource offers made
resource offers accepted
tasks launched
tasks destroyed
tasks lost
writes to replicated log
queue length

export 50th, 90th, 95th, 99th percentile of time taken to:
start mesos (reach a certain state)
move tasks between two given states (starting -> started)
create a cgroup
destroy a cgroup
send a message from slave to master
start a task
stop a task
register in zookeeper
write to the replicated log

Ideally all these metrics would be exposed via a HTTP+JSON endpoint. See [metrics|http://metrics.codahale.com/getting-started/]
for an example (albeit Java) library (or [medida|http://dln.github.io/medida/] for an unmaintained(?)
c++ port)

We've previously seen problems where tasks were stuck in cgroup destroy with >30,000 attempts.
Exposing metrics would allow us to easily detect problems like this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message