couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jedediah Smith <>
Subject Re: sums by date view
Date Mon, 24 Nov 2008 02:27:52 GMT

I ran some experiments and figured out a few things:

If a view's key has components, CouchDB will indeed maintain 
intermediate reduce results at each group level. It will use these 
intermediate results to efficiently calculate arbitrary ranges. For 
example, if I asked for sum(2008-03-11 to 2008-07-25), CouchDB will call 
reduce twice. The first call will sum all the included days in march and 
july. The second reduce will have combine=true and sum the previous 
result with april, may and june, who's sums are already in the index.

CouchDB also seems to intrinsically partition keyspace into groups of 
approximately 43-45. I don't know the significance of this number but it 
is probably some tweaked threshold value for the b-tree algorithm.

The bottom line is that reduced views with arbitrary key ranges run in 
log time, without doing anything special.

Chris Anderson wrote:
> On Sat, Nov 22, 2008 at 9:09 PM, Jedediah Smith
> <> wrote:
>> A possible compromise would be to use group_level to find the balance per
>> component and then add those together on the client. Example:
>> balance(2008-11-22) =
>>  sum(-inf to 2007-) +
>>  sum(2008-01- to 2008-10-) +
>>  sum(2008-11-01 to 2008-11-22)
> This looks like the right way to combine multiple time ranges to me.
> Adding on the client is a fine thing in a case like this. However, I
> think you can do it in a single query.
>> If a view like the
>> above existed and I updated an old transaction, there would only be one
>> rereduce for each group level, right?
> Querying with group=false will be faster, I think. (I should benchmark this...)
> In the normal case, with a modest amount of data, that's about right.
> Each grouped view query (I think... I really should bust out the log()
> in the views to know for sure...) will fire at least one JavaScript
> rereduce. In the case of very very much data and a first time reduce
> query over that range, the rereduce could run a few times, but the #
> of rereduces run should increase only logarithmically with the # of
> rows, if I'm not mistaken. It's only when you run multiple queries (or
> multple reduces for groups within a range) that you're likely to run
> into a linear increase in the number of rereduces. Again, this should
> be explored in the log, but I think you'll get a minimum of 1 rereduce
> per group query.
> The simplest query to get someone's running balance would be something like:
> _view/viewname?startkey=["bob", BEGINNING_OF_TIME]&endkey=["bob", CURRENT_DATE]
> which has an implicit reduce=true&group=false.
> BTW Jan I really like your array date format.

View raw message