couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Blakey <antony.bla...@gmail.com>
Subject Re: Statistics Module
Date Fri, 30 Jan 2009 07:10:53 GMT

On 30/01/2009, at 5:32 PM, Paul Davis wrote:

> On Fri, Jan 30, 2009 at 1:58 AM, Antony Blakey <antony.blakey@gmail.com 
> > wrote:
>>
>> On 30/01/2009, at 4:27 PM, Paul Davis wrote:
>>
>>> On Fri, Jan 30, 2009 at 12:32 AM, Antony Blakey <antony.blakey@gmail.com 
>>> >
>>> wrote:
>>>>
>>>> On 30/01/2009, at 9:56 AM, Paul Davis wrote:
>>>>
>>>>> The way that stats are calculated currently with the dependent
>>>>> variable being time could cause some issues in implementing more
>>>>> statistics. With my extremely limited knowledge of stats I think
>>>>> moving that to be dependent on the number of requests might be  
>>>>> better.
>>>>> This is something that hopefully someone out there knows more  
>>>>> about.
>>>>> (This is in terms of "avg for last 5 minutes" vs "avg for last 100
>>>>> requests", (the later of the two making stddev type stats
>>>>> calculateable on the fly in constant memory.)
>>>>
>>>> The problem with using # of requests is that depending on your  
>>>> data, each
>>>> request may take a long time. I have this problem at the moment:  
>>>> 1008
>>>> documents in a 3.5G media database. During a compact, the status in
>>>> _active_tasks updates every 1000 documents, so you can imagine  
>>>> how useful
>>>> that is :/ I thought it had hung (and neither the beam.smp CPU  
>>>> time nor
>>>> the
>>>> IO requests were a good indicator). I spent some time chasing  
>>>> this down
>>>> as a
>>>> bug before realising the problems was in the status granularity!
>>>>
>>>
>>> Actually I don't think that affects my question at all. It may  
>>> change
>>> how we report things though. As in, it may be important to be able  
>>> to
>>> report things that are not single increment/decrement conditions but
>>> instead allow for adding arbitrary floating point numbers to the
>>> number of recorded data points.
>>
>> I think I have the wrong end of the stick here - my problem was  
>> with the
>> granularity of updates, not with the basis of calculation.
>>
>
> Heh. Well, we can only measure what we know. And in the interest of
> simplicity I think the granularity is gonna have to stick to pretty
> much per request. Also you're flying with 300 MiB docs? perhaps its
> time to chop or store in FTP?

No, lots of attachments per doc. I need them to replicate. 3.5G / 1000  
docs = roughly 3.5 MB attachments per doc. Not unreasonable.

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Plurality is not to be assumed without necessity
   -- William of Ockham (ca. 1285-1349)



Mime
View raw message