couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: [jira] Created: (COUCHDB-396) Fixing weirdness in couch_stats_aggregator.erl
Date Tue, 21 Jul 2009 11:21:51 GMT

On 28 Jun 2009, at 18:50, Paul Davis wrote:
>> 2. Would it be helpful to be able to enable/disable stats  
>> completely. These
>> calculations must add some overhead.
>>
>
> That definitely seems reasonable though I'm not entirely certain how
> best to implement this.


I'd opt for a ./configure option --disable-stats and -ifdef() based  
conditional
code.

Cheers
Jan
--

>
>> 3. The use of moving averages is great, but as you comment there be  
>> quite a
>> lot of variability within a given time interval. Moving averages are
>> generally useful only over time, for example in making short term  
>> trading
>> decisions a moving average can help guess the direction of the next
>> reversion to a mean. In this scenario I would think peak usages  
>> would also
>> be of value. One could maintain min/max stats with respect to these  
>> moving
>> averages along with a time interval in order to identify hot spots.
>>
>
> Sounds reasonable. I'm not sure if min/max is more or less proper than
> quartiles. Or maybe just different? My stats-fu is less than stellar.
>
>> I'll have a closer look and write some tests
>>
>>
>>
>>
>> On Jun 27, 2009, at 9:32 PM, Paul Joseph Davis (JIRA) wrote:
>>
>>> Fixing weirdness in couch_stats_aggregator.erl
>>> ----------------------------------------------
>>>
>>>                Key: COUCHDB-396
>>>                URL: https://issues.apache.org/jira/browse/ 
>>> COUCHDB-396
>>>            Project: CouchDB
>>>         Issue Type: Improvement
>>>         Components: Database Core, HTTP Interface
>>>   Affects Versions: 0.10
>>>        Environment: trunk
>>>           Reporter: Paul Joseph Davis
>>>           Assignee: Paul Joseph Davis
>>>            Fix For: 0.10
>>>        Attachments: couchdb_stats_aggregator.patch
>>>
>>> Looking at adding unit tests to the couchdb_stats_aggregator  
>>> module the
>>> other day I realized it was doing some odd calculations. This is a  
>>> fairly
>>> non-trivial patch so I figured that I'd put in JIRA and get feed  
>>> back before
>>> applying. This patch does everything the old version does afaict,  
>>> but I'll
>>> be adding tests before I consider it complete.
>>>
>>> List of major changes:
>>>
>>> * The old behavior for stats was to integrate incoming values for  
>>> a time
>>> period and then reset the values and start integrating again. That  
>>> seemed a
>>> bit odd so I rewrote things to keep the average and standard  
>>> deviation for
>>> the last N seconds with approximately 1 sample per second.
>>> * Changed request timing calculations [note below]
>>> * Sample periods are configurable in the .ini file. Sample periods  
>>> of 0
>>> are a special case and integrate all values from couchdb boot up.
>>> * Sample descriptions are in the configuration files now.
>>> * You can request different time periods for the root stats end  
>>> point.
>>> * Added a sum to the list of statistics
>>> * Simplified some of the external API
>>>
>>> The biggest change is in how time for requests are calculated.  
>>> AFAICT, the
>>> old way was accumulating request timings in the stats collector  
>>> and just
>>> adding new values as clock ticks went by as everything else does  
>>> which makes
>>> sense in the case of resetting counters every time period. In the  
>>> new way
>>> I'm keeping a list of the samples in the last time period and when  
>>> I get a
>>> clock tick part of the update is to remove the samples that have  
>>> passed out
>>> of the time period. For a variable like request_time this would  
>>> lead to
>>> unbounded storage.
>>>
>>> The new method is calculating the average time of all requests in  
>>> a single
>>> clock tick (1s). One thing this loses is when you start having  
>>> lots of
>>> variability in a single clock tick. Ie, your average request time  
>>> is 100ms,
>>> but 10% of your requests are taking 500ms. I've read of people  
>>> doing the
>>> averaging trick but also storing quantile information as well [1].  
>>> There are
>>> also algorithms for doing single pass quantile estimation and the  
>>> like so
>>> its possible to do those things in O(N) time. The issue with  
>>> quantiles is
>>> that it'd start breaking the logic of how the collector and  
>>> aggregators are
>>> setup. As it is now, there's basically a one event -> one stat  
>>> constraint.
>>> For the time being I went without quartiles to minimize the impact  
>>> of the
>>> patch.
>>>
>>> This code will also be on github [3] as I add patches.
>>>
>>>
>>> [1] http://code.flickr.com/blog/2008/10/27/counting-timing/
>>> [2]
>>> http://www.slamb.org/svn/repos/trunk/projects/loadtest/benchtools/stats.py 
>>>  (See
>>> the QuantileEstimator class)
>>> [3] http://github.com/davisp/couchdb/tree/stats-patch
>>>
>>>
>>>
>>> --
>>> This message is automatically generated by JIRA.
>>> -
>>> You can reply to this email to add a comment to the issue online.
>>>
>>
>>
>


Mime
View raw message