couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: Stats Patch API Discussion
Date Thu, 19 Feb 2009 17:51:01 GMT
Hi,

an update to the stats patch.

Alex & I reworked the internals to make stats usable under high
load. and simplified the code significantly in the process. The
outlined API (see the quoted mail below) is implemented.

The JS test suite had to be disabled right now since stats
are only available with a one second delay and it is hard to
test that from the outside. We'll be adding functional test
on the Erlang level later to make sure we get everything
right.

I did a little testing and it seems to work just fine.

When the EUnit issue clears, I'd like to propose to move the
patch into a SVN branch for integration with trunk. We'd
highly appreciate if you give it a shot already.

Cheers
Alex & Jan
--

On 10 Feb 2009, at 16:19, Jan Lehnardt wrote:

> Hi,
>
> Alex and I are working on our stats package patch and the last
> bigger issue is the API. It is just exposing a bunch of values by
> keys, but as usual, the devil is in the details.
>
> Let me explain.
>
> There are two types of counters. "Hit Counters", that record
> things like the number of requests. They increase monotonically
> each time a request hits CouchDB. This is useful for counting
> stuff. Cool.
>
> Then there are "Absolute Value Counter" (for the lack of a better
> term) that collects absolute values like the number of milliseconds
> a request took to complete. To create a meaningful metric out
> of this type of counter, we need to create averages. There's little
> value in recording each individual request (it could still do that
> in the access logs) for monitoring reports. So we keep some
> aggregate values (min, max, mean, stddev, count (count being
> the number of times this counter was called)).
>
> Complexity++
>
> Say you have a CouchDB running for a month. You change some
> things in your app or in CouchDB and you'd like to know how this
> affected your response time. To effectively see anything you'd have
> to restart CouchDB (and lose all stats) or wait a month. If you'd
> want to see problems coming up in your monitoring, you need finer
> grained time ranges to look at this.
>
> To make this a little more useful Alex and I introduced time ranges.
> These are an additional set of aggregates that get reset every 1, 5
> and 15 minutes. This should be familiar to you from server load.
> You can get the aggregate values for four time ranges:
>
> - Between now and the beginning of time (when CouchDB is
>  started.
> - Between now and 60 seconds ago.
> - Between now and 300 seconds ago
> - Between now and 900 seconds ago
>
> These ranges are hardcoded now, but they can be made configurable
> at a later time.
>
> The API would look like this:
>
> GET /_stats/couchdb/request_time
>
> {
> "couchdb": {
>   "request_time": {
>     "description": "Aggregated request time spent in CouchDB since  
> the beginning of time",
>     "min":20,
>     "max":20,
>     "mean":20,
>     "stddev":20,
>     "count":7,
>     "range":0 // 0 means since day zero.
>   }
> }
> }
>
> To get the aggregates stats for the last minute:
>
> GET /_stats/couchdb/request_time?range=1
>
> {
> "couchdb": {
>   "request_time": {
>     "description": "Aggregated request time spent in CouchDB since 1  
> minute ago",
>     "min":20,
>     "max":20,
>     "mean":20,
>     "stddev":20,
>     "count":7,
>     "range":1 // minute
>   }
> }
> }
>
> Or more generic:
>
> GET /_stats/couchdb/request_time?range=$range
>
> {
> "couchdb": {
>   "request_time": {
>     "description": "Aggregated request time spent in CouchDB since  
> $range minute ago",
>     "min":20,
>     "max":20,
>     "mean":20,
>     "stddev":20,
>     "count":7,
>     "range":$range // minute
>   }
> }
> }
>
> This seems reasonable. the actual naming of "range" and other
> keys can be changed as well as the description text.
>
>
> Complexity--
>
> Remember Hit Counters? Yes, strictly speaking, CouchDB shouldn't
> want to collect any averages there since our monitoring solution
> would take care of that. But then, there are the 4 time-range counters
> available and we could just as well populate them as well. Let's
> say every second:
>
> GET /_stats/httpd/requests[?$resolution=[1,5,15]]
>
> {
> "couchdb": {
>   "request_time": {
>     "description": "Number of requests per second seconds in the  
> last $reolution minutes",
>     "min":20,
>     "max":20,
>     "mean":20,
>     "stddev":20,
>     "count":7,
>     "range":$range // minute
>   }
> }
> }
>
> "count" would be the raw counter for the stats and the rest meaningful
> aggregates.
>
> "per second" is an arbitrary choice again and can be made  
> configurable,
> if needed. To know at what frequency stats are collected, there's a  
> new
> member in the list of aggregates:
>
> {
> "couchdb": {
>   "request_time": {
>     "description": "Number of requests per $frequency seconds in the  
> last $reolution minutes",
>     "min":20,
>     "max":20,
>     "mean":20,
>     "stddev":20,
>     "count":7,
>     "range":$range, // minute
>     "frequency": 1 // second
>   }
> }
> }
>
> Alex I tried to find a couple of different approaches to get here.  
> Different
> URLs for the different types of counters and aggregates, adding  
> members
> in different places, with and without description and a whole lot  
> more,
> but we sure haven't seen all permutations.
>
> This solution offers a unified URL format and a human readable as
> well as a computer parseable way to determine what kind of counter
> you're dealing with.
>
> To just get all stats you can do a
>
> GET /_stats/
>
> and get a huge JSON object back that includes all of the above for all
> resolutions that are currently collected.
>
> Is there anything that does not make sense or is too complicated?
>
> The goal was to create a simple, minimal API for a minimal set
> of useful statistics and Alex and I hope to have found this by
> now. But if you can see how this could be further simplified,
> let us know :)
>
> Alex and I also open for completely different approaches to get
> the data out of CouchDB.
>
> We're looking for a few things in this thread:
>
> - A sanity check to know we're not completely off.
> - A summary (for) you of our way of getting to the current proposal.
> - A consensus of dev@-readers for the final API we'd like to  
> implement.
>
> Note that a few of these things are already implemented and
> others need to be adjusted depending on feedback here.
>
> Please, feed back,
>
> Cheers
> Alex & Jan
> --
>
>


Mime
View raw message