couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Stats Patch API Discussion
Date Tue, 10 Feb 2009 15:19:13 GMT
Hi,

Alex and I are working on our stats package patch and the last
bigger issue is the API. It is just exposing a bunch of values by
keys, but as usual, the devil is in the details.

Let me explain.

There are two types of counters. "Hit Counters", that record
things like the number of requests. They increase monotonically
each time a request hits CouchDB. This is useful for counting
stuff. Cool.

Then there are "Absolute Value Counter" (for the lack of a better
term) that collects absolute values like the number of milliseconds
a request took to complete. To create a meaningful metric out
of this type of counter, we need to create averages. There's little
value in recording each individual request (it could still do that
in the access logs) for monitoring reports. So we keep some
aggregate values (min, max, mean, stddev, count (count being
the number of times this counter was called)).

Complexity++

Say you have a CouchDB running for a month. You change some
things in your app or in CouchDB and you'd like to know how this
affected your response time. To effectively see anything you'd have
to restart CouchDB (and lose all stats) or wait a month. If you'd
want to see problems coming up in your monitoring, you need finer
grained time ranges to look at this.

To make this a little more useful Alex and I introduced time ranges.
These are an additional set of aggregates that get reset every 1, 5
and 15 minutes. This should be familiar to you from server load.
You can get the aggregate values for four time ranges:

- Between now and the beginning of time (when CouchDB is
   started.
- Between now and 60 seconds ago.
- Between now and 300 seconds ago
- Between now and 900 seconds ago

These ranges are hardcoded now, but they can be made configurable
at a later time.

The API would look like this:

GET /_stats/couchdb/request_time

{
  "couchdb": {
    "request_time": {
      "description": "Aggregated request time spent in CouchDB since  
the beginning of time",
      "min":20,
      "max":20,
      "mean":20,
      "stddev":20,
      "count":7,
      "range":0 // 0 means since day zero.
    }
  }
}

To get the aggregates stats for the last minute:

GET /_stats/couchdb/request_time?range=1

{
  "couchdb": {
    "request_time": {
      "description": "Aggregated request time spent in CouchDB since 1  
minute ago",
      "min":20,
      "max":20,
      "mean":20,
      "stddev":20,
      "count":7,
      "range":1 // minute
    }
  }
}

Or more generic:

GET /_stats/couchdb/request_time?range=$range

{
  "couchdb": {
    "request_time": {
      "description": "Aggregated request time spent in CouchDB since  
$range minute ago",
      "min":20,
      "max":20,
      "mean":20,
      "stddev":20,
      "count":7,
      "range":$range // minute
    }
  }
}

This seems reasonable. the actual naming of "range" and other
keys can be changed as well as the description text.


Complexity--

Remember Hit Counters? Yes, strictly speaking, CouchDB shouldn't
want to collect any averages there since our monitoring solution
would take care of that. But then, there are the 4 time-range counters
available and we could just as well populate them as well. Let's
say every second:

GET /_stats/httpd/requests[?$resolution=[1,5,15]]

{
  "couchdb": {
    "request_time": {
      "description": "Number of requests per second seconds in the  
last $reolution minutes",
      "min":20,
      "max":20,
      "mean":20,
      "stddev":20,
      "count":7,
      "range":$range // minute
    }
  }
}

"count" would be the raw counter for the stats and the rest meaningful
aggregates.

"per second" is an arbitrary choice again and can be made configurable,
if needed. To know at what frequency stats are collected, there's a new
member in the list of aggregates:

{
  "couchdb": {
    "request_time": {
      "description": "Number of requests per $frequency seconds in the  
last $reolution minutes",
      "min":20,
      "max":20,
      "mean":20,
      "stddev":20,
      "count":7,
      "range":$range, // minute
      "frequency": 1 // second
    }
  }
}

Alex I tried to find a couple of different approaches to get here.  
Different
URLs for the different types of counters and aggregates, adding members
in different places, with and without description and a whole lot more,
but we sure haven't seen all permutations.

This solution offers a unified URL format and a human readable as
well as a computer parseable way to determine what kind of counter
you're dealing with.

To just get all stats you can do a

GET /_stats/

and get a huge JSON object back that includes all of the above for all
resolutions that are currently collected.

Is there anything that does not make sense or is too complicated?

The goal was to create a simple, minimal API for a minimal set
of useful statistics and Alex and I hope to have found this by
now. But if you can see how this could be further simplified,
let us know :)

Alex and I also open for completely different approaches to get
the data out of CouchDB.

We're looking for a few things in this thread:

- A sanity check to know we're not completely off.
- A summary (for) you of our way of getting to the current proposal.
- A consensus of dev@-readers for the final API we'd like to implement.

Note that a few of these things are already implemented and
others need to be adjusted depending on feedback here.

Please, feed back,

Cheers
Alex & Jan
--


Mime
View raw message