couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alon Keren <>
Subject Re: Reduce just N rows?
Date Mon, 16 Apr 2012 20:12:04 GMT
On 16 April 2012 22:25, James Marca <> wrote:

> On Sun, Apr 15, 2012 at 12:00:38PM +0300, Alon Keren wrote:
> > On 15 April 2012 09:13, James Marca <> wrote:
> >
> > > CouchDB will compute reduced values for what you select.  If you just
> > > ask for values from A to B, it *only* will compute the reduced values
> > > over that range.  So you can get "clever" with the key value, using
> > > something like
> > >
> > > map: emit( [user,game,trynumber], score);
> > >
> > > where trynumber is some value that is guaranteed to increase with each
> > > completed game score stored.
> > >
> > > Your reduce could use the built-in Erlang  _sum
> > >
> > > Then you can just request something like...hmm
> > >
> > > startKey=[user,game,BIGNUMBER]&order=descending&limit=10&reduce=false
> > > (where BIGNUMBER is something bigger than the highest try number of
> game).
> > >
> > > This will give 10 values, and you can do the average lickety-split
> > > client side, OR you can do one query to get highest try number, then
> > > another to get between that game and ten back to let couch compute the
> > > sum for you.
> > >
> >
> > Thanks!
> >
> > I think a simpler alternative to 'trynumber' is the game's timestamp, and
> > BIGNUMBER could be replaced by '{}' (see:
> > That's what I'm doing at
> > the moment :)
> > Unfortunately, as numbers of games and game-types grow, this would become
> > pretty demanding in CPU time and number of calls to couch.
> >
> I thought about timestamp first, but you said you wanted the last 10,
> and I wanted to be able to pipe the request through reduce.
> With timestamps you have to do two requests to get current and 10
> prior, or a single request without reducing.
> At the risk of stating the obvious, if you ask for "limit=10" in a
> request, *and* the request goes through reduce, you will get 10
> reduced values, not 10 values that get reduced to one.  By using an
> integer value, you can do the simple request I settled on above (give
> me ten values, no reduce), OR in a real application you probably know
> the current last game number, so you can pass start and end keys (end
> key is ten less than current game number) and force just 10 results to
> get piped through reduce.

Ah, I think I see now what you're getting it - thanks for clarifying.
It seems to me that even with this approach, if I want to use the db's
reduce, I would have to make a separate query for each game type. Or am I
missing something?

> Also, I really don't think there is any load at all on the CPU with
> this approach.  Or to be more accurate, no more than any active
> database processing a view.  Again apologies for stating the obvious,
> but CouchDB does incremenal updates of views, so if you keep adding
> data, it only processes the new data.  Once you have processed the
> data into a view, querying it (without reduce) takes almost no CPU.
> Reducing it can be expensive if you do something in JavaScript, but
> isn't as expensive if you stick with the built in native Erlang reduce
> functions (sum, count, etc).

Reduces in couchdb should be incremental, unlike when doing them outside of

> But one thing to keep in mind is that you can probably use multiple
> databases. Is there any reason you *have* to put all the games and all
> the users in a single database?  Can you have a database per game?  Or
> a database per user?  then the Views are only updated when a
> particular user is adding results and querying results.
> I do data collection from sensors with CouchDB.  I use one database
> per sensor per year of data, roughly a thousand or so DBs per year.  I
> do this so I can eventually spread the pain on multiple machines (I
> haven't really had to yet), and because Erlang does a really good job
> maxing out a multicore machine if it has a lot of jobs to run.  With
> just one database, I was only getting two cores busy, but with
> thousands (when processing historical data) all 8 cores on my two
> servers were very busy.
> I also keep one database to do aggregations across all the detectors
> at the hourly level (I have 30 second data).  Each db has a view that
> generates hourly summaries I need, and I have a node process that
> polls all the databases at the end of each day to collect hourly
> documents and writes them to the collation database, which has other
> views.  Kind of a manual version of your chained map reduce project
> (incarnate, right?), but it suits the data better than automating it.
> For your app, suppose there are a million users all playing any of a
> thousand games.  If evey user posts a new score every second, ideally
> I would only want to make each player wait for their data to get processed,
> not the data from the other 999,999 players.  So that calls for a
> database per user.  If users have to wait for Erlang to finish other
> jobs before it can schedule the user's job on a CPU, then you need
> more CPUs.   With just one database, you don't get that choice, you
> have to wait for all of the data to get processed (unless you allow
> stale views.

Actually, several users can participate in each game, but their scores are
However, there should be enough user specific data that's derived from
these games, so it may be a good optimization down the line to put at least
this kind of data in user-specific databases.

> As with my app, across users you can have a separate database that queries
> each
> db for that user's last 10 once every minute or so (changes feed would
> probably work really well here...a change adds a callback to get data
> from that database when the periodic process is run) and updates a
> collating db with username_game_average type of documents, to get the
> user's standings compared to other players.

> Regards,
> james
> PS, sorry for the long reply. I've had too much coffee today.
Nothing to be sorry about - thanks a lot for giving it so much attention,


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message