couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damien Katz <dam...@apache.org>
Subject Re: Stats Patch API Discussion
Date Tue, 10 Feb 2009 15:47:49 GMT

On Feb 10, 2009, at 10:19 AM, Jan Lehnardt wrote:

> Hi,
>
> Alex and I are working on our stats package patch and the last
> bigger issue is the API. It is just exposing a bunch of values by
> keys, but as usual, the devil is in the details.
>
> Let me explain.
>
> There are two types of counters. "Hit Counters", that record
> things like the number of requests. They increase monotonically
> each time a request hits CouchDB. This is useful for counting
> stuff. Cool.
>
> Then there are "Absolute Value Counter" (for the lack of a better
> term) that collects absolute values like the number of milliseconds
> a request took to complete. To create a meaningful metric out
> of this type of counter, we need to create averages. There's little
> value in recording each individual request (it could still do that
> in the access logs) for monitoring reports. So we keep some
> aggregate values (min, max, mean, stddev, count (count being
> the number of times this counter was called)).
>
> Complexity++
>
> Say you have a CouchDB running for a month. You change some
> things in your app or in CouchDB and you'd like to know how this
> affected your response time. To effectively see anything you'd have
> to restart CouchDB (and lose all stats) or wait a month. If you'd
> want to see problems coming up in your monitoring, you need finer
> grained time ranges to look at this.
>
> To make this a little more useful Alex and I introduced time ranges.
> These are an additional set of aggregates that get reset every 1, 5
> and 15 minutes. This should be familiar to you from server load.
> You can get the aggregate values for four time ranges:
>
> - Between now and the beginning of time (when CouchDB is
>  started.
> - Between now and 60 seconds ago.
> - Between now and 300 seconds ago
> - Between now and 900 seconds ago
>
> These ranges are hardcoded now, but they can be made configurable
> at a later time.
>
> The API would look like this:
>
> GET /_stats/couchdb/request_time
>
> {
> "couchdb": {
>   "request_time": {
>     "description": "Aggregated request time spent in CouchDB since  
> the beginning of time",
>     "min":20,
>     "max":20,
>     "mean":20,
>     "stddev":20,
>     "count":7,
>     "range":0 // 0 means since day zero.
>   }
> }
> }
>
> To get the aggregates stats for the last minute:
>
> GET /_stats/couchdb/request_time?range=1
>
> {
> "couchdb": {
>   "request_time": {
>     "description": "Aggregated request time spent in CouchDB since 1  
> minute ago",
>     "min":20,
>     "max":20,
>     "mean":20,
>     "stddev":20,
>     "count":7,
>     "range":1 // minute
>   }
> }
> }
>
> Or more generic:
>
> GET /_stats/couchdb/request_time?range=$range
>
> {
> "couchdb": {
>   "request_time": {
>     "description": "Aggregated request time spent in CouchDB since  
> $range minute ago",
>     "min":20,
>     "max":20,
>     "mean":20,
>     "stddev":20,
>     "count":7,
>     "range":$range // minute
>   }
> }
> }
>
> This seems reasonable. the actual naming of "range" and other
> keys can be changed as well as the description text.
>
>
> Complexity--
>
> Remember Hit Counters? Yes, strictly speaking, CouchDB shouldn't
> want to collect any averages there since our monitoring solution
> would take care of that. But then, there are the 4 time-range counters
> available and we could just as well populate them as well. Let's
> say every second:
>
> GET /_stats/httpd/requests[?$resolution=[1,5,15]]
>
> {
> "couchdb": {
>   "request_time": {
>     "description": "Number of requests per second seconds in the  
> last $reolution minutes",
>     "min":20,
>     "max":20,
>     "mean":20,
>     "stddev":20,
>     "count":7,
>     "range":$range // minute
>   }
> }
> }
>
> "count" would be the raw counter for the stats and the rest meaningful
> aggregates.
>
> "per second" is an arbitrary choice again and can be made  
> configurable,
> if needed. To know at what frequency stats are collected, there's a  
> new
> member in the list of aggregates:
>
> {
> "couchdb": {
>   "request_time": {
>     "description": "Number of requests per $frequency seconds in the  
> last $reolution minutes",
>     "min":20,
>     "max":20,
>     "mean":20,
>     "stddev":20,
>     "count":7,
>     "range":$range, // minute
>     "frequency": 1 // second
>   }
> }
> }
>
> Alex I tried to find a couple of different approaches to get here.  
> Different
> URLs for the different types of counters and aggregates, adding  
> members
> in different places, with and without description and a whole lot  
> more,
> but we sure haven't seen all permutations.
>
> This solution offers a unified URL format and a human readable as
> well as a computer parseable way to determine what kind of counter
> you're dealing with.
>
> To just get all stats you can do a
>
> GET /_stats/
>
> and get a huge JSON object back that includes all of the above for all
> resolutions that are currently collected.
>
> Is there anything that does not make sense or is too complicated?
>
> The goal was to create a simple, minimal API for a minimal set
> of useful statistics and Alex and I hope to have found this by
> now. But if you can see how this could be further simplified,
> let us know :)
>
> Alex and I also open for completely different approaches to get
> the data out of CouchDB.
>
> We're looking for a few things in this thread:
>
> - A sanity check to know we're not completely off.
> - A summary (for) you of our way of getting to the current proposal.
> - A consensus of dev@-readers for the final API we'd like to  
> implement.
>
> Note that a few of these things are already implemented and
> others need to be adjusted depending on feedback here.
>
> Please, feed back,
>
> Cheers
> Alex & Jan
> --
>

I'm digging it. Link to diffs?

-Damien


Mime
View raw message