incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hhsuper <hhsu...@gmail.com>
Subject Re: 'Grouping' documents so that a set of documents is passed to the view function
Date Thu, 25 Jun 2009 01:24:31 GMT
Thx Brian very much for the quickly reply, I should say my description isn't
very clealy, there is some complex business logic need to be impl within
reduce.

I descripe the application scenario carefully: when user learn from one
dialog, they start a session( sessionid), the study on every line in dialog
generate a couchdb document(there are uid/dialogid/sessionid,
wordcount/weightedScore/grade for the line), the user could re-study the
same dialog some days later, so they start a new session but for the same
dialog, we want get every user's average grade from their study
results(dialog as unit, so we need sum for specified session) but for the
same dialog we only want to use the highest grade of  session not use all
session

this seem to difficult to impl with one view,  as impl in rdbms we need
build sql query on a subquery(or on a db view), is that proper to impl with
couchdb's view?

you are right brian, wordcount/weightedScore can be simple summed, average
grade = weightedScore / wordcount, I paster my previous reduce function code
bellow(code maybe already complex), by the way when you said "root node with
*all* the uids", i think i don't very clearly about the view's internal
store structure and i can't find in wiki:

function(keys, values, rereduce) {
  var wordCount = 0;
  var weightedScore = 0;
  if( !rereduce ) {
    // This is the reduce phase, we are reducing over emitted values from
the map functions.
    var sessions = {};
    for(var k in keys){
        //caculate the total value for every session(contain multi
sessiondialog<=>couchdb document)
        var key = keys[k][0];
        key = key?key.join('_'):key;
        if (!sessions[key]) {
            sessions[key] = values[k];
        }else{
            sessions[key].wordCount += values[k].wordCount;
            sessions[key].weightedScore += values[k].weightedScore;
            sessions[key].grade =
sessions[key].weightedScore/sessions[key].wordCount;
        }
    }
    //caculate the top session for each dialog
    var dialogsessions = {};
    for(var sk in sessions){
        var dialogId = sk?sk.split('_')[1]:sk;
        if(!dialogsessions[dialogId]){
            dialogsessions[dialogId] = sessions[sk];
        }else if(dialogsessions[dialogId].grade < sessions[sk].grade){
            dialogsessions[dialogId] = sessions[sk];
        }
    }
    //caculate the result
    for(var ds in dialogsessions){
        wordCount += dialogsessions[ds].wordCount;
        weightedScore += dialogsessions[ds].weightedScore;
    }
  } else {
    // This is the rereduce phase, we are re-reducing previosuly reduced
values.
    for(var i in values) {
      wordCount += values[i].wordCount;
      weightedScore += values[i].weightedScore;
    }
  }

  return {"wordCount"    : wordCount,
          "weightedScore"    : weightedScore,
          "grade" : weightedScore/wordCount
     };
}

On Wed, Jun 24, 2009 at 11:43 PM, Brian Candler <B.Candler@pobox.com> wrote:

> On Wed, Jun 24, 2009 at 06:35:56PM +0800, hhsuper wrote:
> >    map function emit structure(key cols refer to uid/dialogid/sessionid):
> >    emit( ["86", "10380", "4172"], {wordCount: 20, weightedScore: 1380,
> >    grade: 69})
> >    reduce function return: {wordCount: 20, weightedScore: 1380, grade:
> 69}
> >    the reduce function's logic: first caculate the sum value for every
> >    unique  uid_dialogid_sessionid key, then get the max value for every
> >    unique uid_dialogid key, at last sum the values for the key uid, these
> >    caculate on wordCount/weightedScore/grade
>
> Code would probably speak clearer than words here. Since I don't understand
> your algorithm from that description, I can only talk in generalities.
>
> Assuming that you have some uid and some calculated values against that uid
> (and the same uid appears in multiple documents), then one option would be
> a
> reduce function which emits
>
>   {
>    uid1: {wordCount: 20, weightedScore: 1380, grade: 69},
>    uid2: {...etc}
>   }
>
> Then the rereduce function performs the same logic for all the uids seen in
> the input. However the output of such a reduce function will grow without
> bounds, and the root node will include the information for *all* the uids.
> This is not good.
>
> A better reduce function would output null if it has multiple uids in its
> input. If it sees only a single uid across all its inputs, it can output
>
>  {uid: 1234, wordCount: 20, weightedScore: 1380, grade: 69}
>
> Then the re-reduce function would do the same: if all its inputs have the
> same uid then it calculates the relevant values, otherwise outputs null.
> This obviously reduces to null, except when you do a query where the key
> range covers documents with only one uid (or you group by uid), in which
> case you'll get the info you're looking for.
>
> All this depends on the logic by which wordCount, weightedScore and grade
> from multiple documents may be combined, and whether the intermediate
> results can also be combined. I mean, I imagine the wordCount's can simply
> be summed, but can the other values be combined similarly?
>
> But in any case: reduce functions are not suitable for all purposes. If you
> can't get the answer you need from a reduce function, then you need to
> perform the calculation client-side. Sorry, that's how it is.
>
> Regards,
>
> Brian.
>



-- 
Yours sincerely

Jack Su

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message