incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Candler <>
Subject Re: 'Grouping' documents so that a set of documents is passed to the view function
Date Wed, 24 Jun 2009 15:43:01 GMT
On Wed, Jun 24, 2009 at 06:35:56PM +0800, hhsuper wrote:
>    map function emit structure(key cols refer to uid/dialogid/sessionid):
>    emit( ["86", "10380", "4172"], {wordCount: 20, weightedScore: 1380,
>    grade: 69})
>    reduce function return: {wordCount: 20, weightedScore: 1380, grade: 69}
>    the reduce function's logic: first caculate the sum value for every
>    unique  uid_dialogid_sessionid key, then get the max value for every
>    unique uid_dialogid key, at last sum the values for the key uid, these
>    caculate on wordCount/weightedScore/grade

Code would probably speak clearer than words here. Since I don't understand
your algorithm from that description, I can only talk in generalities.

Assuming that you have some uid and some calculated values against that uid
(and the same uid appears in multiple documents), then one option would be a
reduce function which emits

    uid1: {wordCount: 20, weightedScore: 1380, grade: 69},
    uid2: {...etc}

Then the rereduce function performs the same logic for all the uids seen in
the input. However the output of such a reduce function will grow without
bounds, and the root node will include the information for *all* the uids.
This is not good.

A better reduce function would output null if it has multiple uids in its
input. If it sees only a single uid across all its inputs, it can output

  {uid: 1234, wordCount: 20, weightedScore: 1380, grade: 69}

Then the re-reduce function would do the same: if all its inputs have the
same uid then it calculates the relevant values, otherwise outputs null.
This obviously reduces to null, except when you do a query where the key
range covers documents with only one uid (or you group by uid), in which
case you'll get the info you're looking for.

All this depends on the logic by which wordCount, weightedScore and grade
from multiple documents may be combined, and whether the intermediate
results can also be combined. I mean, I imagine the wordCount's can simply
be summed, but can the other values be combined similarly?

But in any case: reduce functions are not suitable for all purposes. If you
can't get the answer you need from a reduce function, then you need to
perform the calculation client-side. Sorry, that's how it is.



View raw message