mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Mahler (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MESOS-1036) Implement a library for exposing statistical metrics.
Date Tue, 25 Feb 2014 19:22:24 GMT
Benjamin Mahler created MESOS-1036:
--------------------------------------

             Summary: Implement a library for exposing statistical metrics.
                 Key: MESOS-1036
                 URL: https://issues.apache.org/jira/browse/MESOS-1036
             Project: Mesos
          Issue Type: Improvement
          Components: stats
            Reporter: Benjamin Mahler


At the current time, reporting of statistical metrics is dedicated to specific endpoints for
each component, primarily the following two:

{noformat}
/master/stats.json
/slave/stats.json
{noformat}

Additional endpoints have not been added (for example, containerization statistics, allocator
statistics, libprocess statistics) due to the inherent difficulty involved: one must either
expose this data up to these higher level endpoints, or add a new endpoint for exposing the
component specific statistics.

This is why the {{Statistics}} class in libprocess was created, however it is not being used
for any statistical reporting at the current time.

[~benjaminhindman] and I had white-boarded the kinds of abstractions we wanted to build to
make statistical reporting trivial from anywhere in the code:

Create the notion of a {{Statistic}} or {{Metric}} object that can be directly manipulated
to store statistics, for example:

{code}
// In the Registrar initialization:
Metric storage_latency = statistics.create("registrar", "storage_latency");

// Recording an individual storage latency.
storage_latency.set(latency);
{code}

In addition to this, we wanted the notion of a {{Meter}}, which automatically exposes a metered
version of a statistic, for example:

{code}
Metric storage_latency = statistics.create("registrar", "storage_latency");

// Adds "storage_latency_average" which computes average over the window.
statistics.meter(storage_latency, Average());

// Adds a "storage_latency_p99", percentile is a non-trivial implementation.
statistics.meter(registrar_storage_latency, Percentile(99));

// Adds a "storage_latency_maximum"
statistics.meter(registrar_storage_latency, Maximum());
{code}

Of course, I'm not advocating a particular API in the above examples, I'm just laying out
the types of things we wanted to see available.

As we add these types of abstractions, we will want to avoid storing large time series data
in memory as is currently done in {{Statistics}}. There are a number of things to consider
with respect to the windowing technique, but I think the notion of a window should transition
from "amount of history to be kept" to "a statistical rolling window". For example, when computing
an average, you would most likely want a rolling 1 minute average, as opposed to the average
for a 2 week window.

Efficiency of this library will be important to avoid high RSS overhead.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message