couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Marca <jma...@translab.its.uci.edu>
Subject Re: Reduce just N rows?
Date Mon, 16 Apr 2012 20:41:08 GMT
On Mon, Apr 16, 2012 at 11:12:04PM +0300, Alon Keren wrote:
> On 16 April 2012 22:25, James Marca <jmarca@translab.its.uci.edu> wrote:
> 
> > On Sun, Apr 15, 2012 at 12:00:38PM +0300, Alon Keren wrote:
> > > On 15 April 2012 09:13, James Marca <jmarca@translab.its.uci.edu> wrote:
> > >
...
> >
> > Also, I really don't think there is any load at all on the CPU with
> > this approach.  Or to be more accurate, no more than any active
> > database processing a view.  Again apologies for stating the obvious,
> > but CouchDB does incremenal updates of views, so if you keep adding
> > data, it only processes the new data.  Once you have processed the
> > data into a view, querying it (without reduce) takes almost no CPU.
> > Reducing it can be expensive if you do something in JavaScript, but
> > isn't as expensive if you stick with the built in native Erlang reduce
> > functions (sum, count, etc).
> >
> 
> Reduces in couchdb should be incremental, unlike when doing them outside of
> couch.
> 

Just to clarify what I mean, tes the reduces are incremental, but what
I mean by expensive is the cost of serializing data to JSON,
processing javascript, return to Erlang.  Reducing will do that at
least once as far as I understand, and sometimes more if the query
falls across tree boundaries.  for you, ten documents/rows is probably
cheap.  For me, with 120 large docs per hour per detector, it gets
expensive.  But if you stick with _sum type functions, you don't pay
for the serialization to JSON/JavsScript and back again.

http://wiki.apache.org/couchdb/Performance#Erlang_implementations_of_common_JavaScript_functions
http://wiki.apache.org/couchdb/Built-In_Reduce_Functions

For your case, you might try the _stats function on that second page
Although that wiki page is old, it is still in the code:
src/couchdb/couch_query_servers.erl.

Also, I prefer _sum and _count over _stats because you can pass in lists of
numbers, not just single numbers.  My views dump arrays of numbers,
and _sum does the right thing and produces an array of sums.  Stats
does not produce an array of arrays, sadly.


James

Mime
View raw message