couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jochen Kempf <jochenke...@gmail.com>
Subject Re: Problems with reduce in view appear when record size > 6
Date Thu, 30 Jul 2009 22:45:00 GMT
OK, so what is the recommended way to reduce and group records by fields to
get both the count and an array of further non-grouped fields?

Here is an example:

I want to fetch all records that have the same values in the fields "year",
"month", "day" and "category".
And additionally I would like to be able to access a hash of ids, revs and
further record specific fields for each of these fetched records.


Jochen


-

2009/7/30 Brian Candler <B.Candler@pobox.com>

> On Wed, Jul 29, 2009 at 11:48:59PM -0400, Jochen Kempf wrote:
> >    guessing that you refer to this page [1]incremental map
>
> No, I meant this one.
> http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
>
> "Reduce functions must accept, as input, results emitted by its
> corresponding map function *as well as results returned by the reduce
> function itself*. The latter case is referred to as a rereduce"
>
> It then goes on to describe the two cases.
>
> >    map =>
> >    "
> >      function(doc) {
> >      emit(doc["_id"], [doc["_id"], doc["_rev"], doc["var1"], doc["var2"],
> >    doc["var3"], doc["var4"], doc["var5"]]);
> >      }
> >    "
> >    reduce =>
> >    "
> >      function(key, values, combine) {
> >            var result = {ids:[], revs:[], variables:[]}
> >              if (combine) {
> >                for (i in values) {
> >                  result.ids.push(values[i].ids);
> >                  result.revs.push(values[i].revs);
> >                  result.variables.push(values[i].variables);
> >                }
> >              } else {
> >                for (i in values) {
> >                  result.ids.push(values[i][0]);
> >                  result.revs.push(values[i][1]);
> >                  result.variables.push([values[i][2], values[i][3],
> >    values[i][4], values[i][5], values[i][6]]);
> >                }
> >              }
> >            return result;
> >          }
> >    "
>
> I think you want concat() rather than push() in the combine section.
>
> Otherwise, that looks like a working but extremely bad reduce function.
> Once
> your database goes above a certain size it will trigger a limit error in
> CouchDB; you can disable that error, but then you will suffer very poor
> performance as your database gets bigger.
>
> The problem is that your reduce value doesn't "reduce" the size of your
> output at all; the size of the reduce value will increase linearly with the
> size of the database. CouchDB stores the reduce value across the documents
> in a Btree node and its children within the Btree node. This means the root
> Btree node stores the reduce value across the entire database.
>
> This is very good for calculating reduce values quickly, but very bad if
> your reduce value becomes huge, as yours will, because it will become
> slower
> and slower to insert documents.
>
> See "Reduced Value Sizes" in the Wiki page linked to above.
>
> Basically this means you're doing it wrong. This sort of computation should
> be done in the client, not the database. If you really want to do it in the
> database, do it in a _list view. (This will still end up fetching and
> serializing all the documents in the database or the key range in question,
> but at least won't send them over the wire)
>
> Regards,
>
> Brian.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message