couchdb-erlang mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Anderson <>
Subject Re: starting on metrics
Date Fri, 16 Nov 2012 00:00:38 GMT
I agree with Paul on the collection/reporting split and that we
shouldn't choose a protocol - that's one place where Folsom got things
right, splitting folsom_webmachine out of the main codebase.

I implemented the system we're using at Cloudant and made a lot of
those mistakes. We may open source it, but it's definitely not a good
candidate for a drop in to CouchDB. I'd love to work on something for
CouchDB, though, and get it right this time.

We could certainly get away with counters, gauges, and a distribution
metric (histogram?).

With regard to the rolling windows ("stats over past N seconds") idea
- it's definitely more complex on both the implementation and API
fronts, but I think it's worthwhile to keep around in some form. If
you toss it for fixed-windows - i.e., collect data for N seconds,
calculate your stats, then throw it away and start anew - you lose the
ability to take meaningful measurements at any point in time. This can
be misleading for pull-based requesters, such as humans. The API could
certainly be simplified. Perhaps the window size could be specified at
metric creation/specification time and returned along with the

Again, I'd love to work on this with you, Dave. I'll give this some
thought tonight and see if I can come up with a good API proposal to
bounce off the list.


On Thu, Nov 15, 2012 at 2:45 PM, Paul Davis <> wrote:
> Definitely good to make something work to play with. On a related note I
> think we need to seriously reevaluate some of the ways we use the config
> for these bits (granted, that's a future only tangentially related thing).
> As to your list of metrics, I think it depends on what you mean. The
> general types of stats that I'm aware of usually fit into a small number of
> categories:
> counters - generally speaking an atomically incrementing value (ie, open
> couchjs processes)
> gauges - record an absolute value (ie, CPU temperature)
> meters - record a rate of events (ie, HTTP requests)
> statsystuff - Slightly more complicated bits for recording stats on
> recorded values (ie, request latency with avg/stddev/min/max/percentiles)
> And I'd note that you can get away without some of these. Meters can be
> implemented with a counter and then using a derivative when graphing
> (Graphite does this with the nonNegativeDerivative function).
> (Didn't know where to put this, but the middle seems good) Also one thing
> we should look into is removing the time series based stats. Ie, the "stats
> over last, 1, 60, 300, seonds" stuff as it makes things quite difficult and
> AFAIK isn't really useful (especially if you forward to a metrics analysis
> system). This would save us significantly in CPU and complexity.
> If I were going to write this code I would start by taking a look at a few
> other libraries and then figuring out what we might need as an API within
> the code base. Right now I could see us getting away with just counters,
> gauges, and maybe a basic statsy kind.
> Once you have the API then its just a matter of figuring out how to specify
> an implementation. I'm not sure what you mean by a custom behavior in this
> particular instance. We could write a behavior for a stats processor that
> implements the metric types we decide on I guess. Its really not super
> duper important other than it provides some compile time checks (but it
> also requires figuring out code paths when you compile the module that
> implements the behavior (and given that this thing would see high traffic I
> would go without cause you'll see if you forgot to implement a function
> quite quickly)). The newer couch_index code does stuff kinda like this.
> Though its a lot more involved that you'd want to be. Also, more wild ideas
> in response to your efficiency questions.
> So I can actually think of a couple ways to do this efficiently that will
> limit the overhead for implementation. There a bit complex in terms of the
> hack, but would be relatively constrained in where the complexity lives.
> For the time being I would start with something like mochiglobal to
> efficiently decide if you need to record a metric. Although that's a bit
> restrictive in that it requires atoms as key names. I have a similar module
> I can open source that allows arbitrary keys at the expense of adding a
> function clause pattern match. Although if you want to get *really*
> awesomely crazy, a fun way to try doing this particular "implementation
> swap" would be to dynamically replace the implementation module at runtime
> (not as crazy as it sounds, but a bit still slightly crazy). CouchDB could
> ship with two versions of this module. One would be the current "expose
> values over HTTP" method and one could be a "no-op" that people who just
> wanted performance could use (nfc what the performance penalties are of the
> current style, though it has tipped nodes over before).
> Things to look at for thoughts:
> On Thu, Nov 15, 2012 at 4:35 PM, Dave Cottlehuber <> wrote:
>> On 15 November 2012 14:13, Paul Davis <> wrote:
>> > The idea here is good but I'm not at all a fan of the implementation.
>> First
>> > off, no way should we be choosing a specific stats collection protocol.
>> > They're just too specific to a particular operations/infra configuration
>> > that anything we pick is going to be inadequate for a non trivial number
>> of
>> > users.
>> Absolutely - but as a first go I am learning a lot :-)). First make it
>> work, then make it pretty?
>> Yesterday I hacked in starting up estatsd and enabling/disabling this
>> via config file:
>> It's hacky but it works, I think.
>> > OTOH, I think it would be a very good idea to sit down and design the
>> stats
>> > API to be pluggable. We already have two rough sides to the API
>> (collection
>> > vs reporting). If we sat down and designed a collection API that would
>> then
>> > talk to a configurable reporting API it'd allow for users to do a number
>> of
>> > cool things with stats.
>> Nice split.
>> Re measuring "properly" we could get by with 3 "things":
>> - counters (http reqs, # of active couchjs procs maybe)
>> - duration
>> - events (replication started, etc)
>> And then plug into graphite, riemann, whatever take your fancy. Would
>> the best way to provide that API interface these counters be to write
>> a custom behaviour? Any existing code you can point to that does this
>> sort of thing?
>> Last question, any tip on how to implement this in a way that you can
>> turn off metrics and avoid the performance hit completely, without
>> needing a recompile (e.g. to remove macros)?
>> A+
>> Dave

View raw message