couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: [jira] Created: (COUCHDB-396) Fixing weirdness in couch_stats_aggregator.erl
Date Sun, 28 Jun 2009 17:18:39 GMT
On Sun, Jun 28, 2009 at 12:50 PM, Paul Davis<paul.joseph.davis@gmail.com> wrote:
> On Sun, Jun 28, 2009 at 10:28 AM, Robert
> Dionne<dionne@dionne-associates.com> wrote:
>> This seems like good improvement. I've only superficially reviewed the new
>> version, but have a couple of trivial and one non-trivial comments:
>>
>> 1. The ini file is growing. This might be a good candidate for it's own ini
>> file. It's somewhat orthogonal and could be nicely boxed out and ignored by
>> folks with no interest in such things. The downside is the complexity of
>> multiple files
>>
>
> Definitely agree. Another thing I noticed is that the startup message
> with debug logging is also a bit longish. Noah's already got it setup
> to allow for multiple files using the default.d/ directories, but I
> wasn't quite certain on how to setup the build system
>
>> 2. Would it be helpful to be able to enable/disable stats completely. These
>> calculations must add some overhead.
>>
>
> That definitely seems reasonable though I'm not entirely certain how
> best to implement this.
>

Before I forget, I did have the idea of making the resolution
configurable to make stats updating happen some multiple of a second.
The issue with that though is that I can't figure out how to make
request timings work with ets. I don't see a way in the ets docs to
make an update atomic when the update requires knowledge of the
current value.

>> 3. The use of moving averages is great, but as you comment there be quite a
>> lot of variability within a given time interval. Moving averages are
>> generally useful only over time, for example in making short term trading
>> decisions a moving average can help guess the direction of the next
>> reversion to a mean. In this scenario I would think peak usages would also
>> be of value. One could maintain min/max stats with respect to these moving
>> averages along with a time interval in order to identify hot spots.
>>
>
> Sounds reasonable. I'm not sure if min/max is more or less proper than
> quartiles. Or maybe just different? My stats-fu is less than stellar.
>
>> I'll have a closer look and write some tests
>>
>>
>>
>>
>> On Jun 27, 2009, at 9:32 PM, Paul Joseph Davis (JIRA) wrote:
>>
>>> Fixing weirdness in couch_stats_aggregator.erl
>>> ----------------------------------------------
>>>
>>>                Key: COUCHDB-396
>>>                URL: https://issues.apache.org/jira/browse/COUCHDB-396
>>>            Project: CouchDB
>>>         Issue Type: Improvement
>>>         Components: Database Core, HTTP Interface
>>>   Affects Versions: 0.10
>>>        Environment: trunk
>>>           Reporter: Paul Joseph Davis
>>>           Assignee: Paul Joseph Davis
>>>            Fix For: 0.10
>>>        Attachments: couchdb_stats_aggregator.patch
>>>
>>> Looking at adding unit tests to the couchdb_stats_aggregator module the
>>> other day I realized it was doing some odd calculations. This is a fairly
>>> non-trivial patch so I figured that I'd put in JIRA and get feed back before
>>> applying. This patch does everything the old version does afaict, but I'll
>>> be adding tests before I consider it complete.
>>>
>>> List of major changes:
>>>
>>> * The old behavior for stats was to integrate incoming values for a time
>>> period and then reset the values and start integrating again. That seemed a
>>> bit odd so I rewrote things to keep the average and standard deviation for
>>> the last N seconds with approximately 1 sample per second.
>>> * Changed request timing calculations [note below]
>>> * Sample periods are configurable in the .ini file. Sample periods of 0
>>> are a special case and integrate all values from couchdb boot up.
>>> * Sample descriptions are in the configuration files now.
>>> * You can request different time periods for the root stats end point.
>>> * Added a sum to the list of statistics
>>> * Simplified some of the external API
>>>
>>> The biggest change is in how time for requests are calculated. AFAICT, the
>>> old way was accumulating request timings in the stats collector and just
>>> adding new values as clock ticks went by as everything else does which makes
>>> sense in the case of resetting counters every time period. In the new way
>>> I'm keeping a list of the samples in the last time period and when I get a
>>> clock tick part of the update is to remove the samples that have passed out
>>> of the time period. For a variable like request_time this would lead to
>>> unbounded storage.
>>>
>>> The new method is calculating the average time of all requests in a single
>>> clock tick (1s). One thing this loses is when you start having lots of
>>> variability in a single clock tick. Ie, your average request time is 100ms,
>>> but 10% of your requests are taking 500ms. I've read of people doing the
>>> averaging trick but also storing quantile information as well [1]. There are
>>> also algorithms for doing single pass quantile estimation and the like so
>>> its possible to do those things in O(N) time. The issue with quantiles is
>>> that it'd start breaking the logic of how the collector and aggregators are
>>> setup. As it is now, there's basically a one event -> one stat constraint.
>>> For the time being I went without quartiles to minimize the impact of the
>>> patch.
>>>
>>> This code will also be on github [3] as I add patches.
>>>
>>>
>>> [1] http://code.flickr.com/blog/2008/10/27/counting-timing/
>>> [2]
>>> http://www.slamb.org/svn/repos/trunk/projects/loadtest/benchtools/stats.py (See
>>> the QuantileEstimator class)
>>> [3] http://github.com/davisp/couchdb/tree/stats-patch
>>>
>>>
>>>
>>> --
>>> This message is automatically generated by JIRA.
>>> -
>>> You can reply to this email to add a comment to the issue online.
>>>
>>
>>
>

Mime
View raw message