Benjamin Mahler created MESOS1036:

Summary: Implement a library for exposing statistical metrics.
Key: MESOS1036
URL: https://issues.apache.org/jira/browse/MESOS1036
Project: Mesos
Issue Type: Improvement
Components: stats
Reporter: Benjamin Mahler
At the current time, reporting of statistical metrics is dedicated to specific endpoints for
each component, primarily the following two:
{noformat}
/master/stats.json
/slave/stats.json
{noformat}
Additional endpoints have not been added (for example, containerization statistics, allocator
statistics, libprocess statistics) due to the inherent difficulty involved: one must either
expose this data up to these higher level endpoints, or add a new endpoint for exposing the
component specific statistics.
This is why the {{Statistics}} class in libprocess was created, however it is not being used
for any statistical reporting at the current time.
[~benjaminhindman] and I had whiteboarded the kinds of abstractions we wanted to build to
make statistical reporting trivial from anywhere in the code:
Create the notion of a {{Statistic}} or {{Metric}} object that can be directly manipulated
to store statistics, for example:
{code}
// In the Registrar initialization:
Metric storage_latency = statistics.create("registrar", "storage_latency");
// Recording an individual storage latency.
storage_latency.set(latency);
{code}
In addition to this, we wanted the notion of a {{Meter}}, which automatically exposes a metered
version of a statistic, for example:
{code}
Metric storage_latency = statistics.create("registrar", "storage_latency");
// Adds "storage_latency_average" which computes average over the window.
statistics.meter(storage_latency, Average());
// Adds a "storage_latency_p99", percentile is a nontrivial implementation.
statistics.meter(registrar_storage_latency, Percentile(99));
// Adds a "storage_latency_maximum"
statistics.meter(registrar_storage_latency, Maximum());
{code}
Of course, I'm not advocating a particular API in the above examples, I'm just laying out
the types of things we wanted to see available.
As we add these types of abstractions, we will want to avoid storing large time series data
in memory as is currently done in {{Statistics}}. There are a number of things to consider
with respect to the windowing technique, but I think the notion of a window should transition
from "amount of history to be kept" to "a statistical rolling window". For example, when computing
an average, you would most likely want a rolling 1 minute average, as opposed to the average
for a 2 week window.
Efficiency of this library will be important to avoid high RSS overhead.

This message was sent by Atlassian JIRA
(v6.1.5#6160)
