On 28 Jun 2009, at 18:50, Paul Davis wrote:
>> 2. Would it be helpful to be able to enable/disable stats
>> completely. These
>> calculations must add some overhead.
>>
>
> That definitely seems reasonable though I'm not entirely certain how
> best to implement this.
I'd opt for a ./configure option --disable-stats and -ifdef() based
conditional
code.
Cheers
Jan
--
>
>> 3. The use of moving averages is great, but as you comment there be
>> quite a
>> lot of variability within a given time interval. Moving averages are
>> generally useful only over time, for example in making short term
>> trading
>> decisions a moving average can help guess the direction of the next
>> reversion to a mean. In this scenario I would think peak usages
>> would also
>> be of value. One could maintain min/max stats with respect to these
>> moving
>> averages along with a time interval in order to identify hot spots.
>>
>
> Sounds reasonable. I'm not sure if min/max is more or less proper than
> quartiles. Or maybe just different? My stats-fu is less than stellar.
>
>> I'll have a closer look and write some tests
>>
>>
>>
>>
>> On Jun 27, 2009, at 9:32 PM, Paul Joseph Davis (JIRA) wrote:
>>
>>> Fixing weirdness in couch_stats_aggregator.erl
>>> ----------------------------------------------
>>>
>>> Key: COUCHDB-396
>>> URL: https://issues.apache.org/jira/browse/
>>> COUCHDB-396
>>> Project: CouchDB
>>> Issue Type: Improvement
>>> Components: Database Core, HTTP Interface
>>> Affects Versions: 0.10
>>> Environment: trunk
>>> Reporter: Paul Joseph Davis
>>> Assignee: Paul Joseph Davis
>>> Fix For: 0.10
>>> Attachments: couchdb_stats_aggregator.patch
>>>
>>> Looking at adding unit tests to the couchdb_stats_aggregator
>>> module the
>>> other day I realized it was doing some odd calculations. This is a
>>> fairly
>>> non-trivial patch so I figured that I'd put in JIRA and get feed
>>> back before
>>> applying. This patch does everything the old version does afaict,
>>> but I'll
>>> be adding tests before I consider it complete.
>>>
>>> List of major changes:
>>>
>>> * The old behavior for stats was to integrate incoming values for
>>> a time
>>> period and then reset the values and start integrating again. That
>>> seemed a
>>> bit odd so I rewrote things to keep the average and standard
>>> deviation for
>>> the last N seconds with approximately 1 sample per second.
>>> * Changed request timing calculations [note below]
>>> * Sample periods are configurable in the .ini file. Sample periods
>>> of 0
>>> are a special case and integrate all values from couchdb boot up.
>>> * Sample descriptions are in the configuration files now.
>>> * You can request different time periods for the root stats end
>>> point.
>>> * Added a sum to the list of statistics
>>> * Simplified some of the external API
>>>
>>> The biggest change is in how time for requests are calculated.
>>> AFAICT, the
>>> old way was accumulating request timings in the stats collector
>>> and just
>>> adding new values as clock ticks went by as everything else does
>>> which makes
>>> sense in the case of resetting counters every time period. In the
>>> new way
>>> I'm keeping a list of the samples in the last time period and when
>>> I get a
>>> clock tick part of the update is to remove the samples that have
>>> passed out
>>> of the time period. For a variable like request_time this would
>>> lead to
>>> unbounded storage.
>>>
>>> The new method is calculating the average time of all requests in
>>> a single
>>> clock tick (1s). One thing this loses is when you start having
>>> lots of
>>> variability in a single clock tick. Ie, your average request time
>>> is 100ms,
>>> but 10% of your requests are taking 500ms. I've read of people
>>> doing the
>>> averaging trick but also storing quantile information as well [1].
>>> There are
>>> also algorithms for doing single pass quantile estimation and the
>>> like so
>>> its possible to do those things in O(N) time. The issue with
>>> quantiles is
>>> that it'd start breaking the logic of how the collector and
>>> aggregators are
>>> setup. As it is now, there's basically a one event -> one stat
>>> constraint.
>>> For the time being I went without quartiles to minimize the impact
>>> of the
>>> patch.
>>>
>>> This code will also be on github [3] as I add patches.
>>>
>>>
>>> [1] http://code.flickr.com/blog/2008/10/27/counting-timing/
>>> [2]
>>> http://www.slamb.org/svn/repos/trunk/projects/loadtest/benchtools/stats.py
>>> (See
>>> the QuantileEstimator class)
>>> [3] http://github.com/davisp/couchdb/tree/stats-patch
>>>
>>>
>>>
>>> --
>>> This message is automatically generated by JIRA.
>>> -
>>> You can reply to this email to add a comment to the issue online.
>>>
>>
>>
>
|