incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hhsuper <hhsu...@gmail.com>
Subject Re: 'Grouping' documents so that a set of documents is passed to the view function
Date Mon, 22 Jun 2009 13:49:15 GMT
Brian's decription for reduce function is clearly, but i think you can
achieve your goal as below:

function(doc){
  emit(doc.group_key, doc)
}

function(keys,values,rereduce){
  //...
}

with the group=true option you can impl doc process by group_key in reduce
function,
as your example reduce will be invoke two times with values:
[firstdoc,seconddoc] which groupkey=1.... [thirddoc,forthdoc]  groupkey=2,
is that enough?

On Mon, Jun 22, 2009 at 5:07 PM, Brian Candler <B.Candler@pobox.com> wrote:

> On Fri, Jun 19, 2009 at 09:43:31AM +0200, Daniel Trümper wrote:
> > Hi,
> >
> > I am somewhat new to CouchDB but have been doing some stuff with it and
> > this is my first post to the list so pardon if I am wrong :)
> >
> >
> >> It would be really cool if there were some way to pass all the docs
> >> with a value of 1 for group_key to a single map function call, so I
> >> could do computation across those related documents and emit the
> >> results ...  I'm just using the magic group_key attribute as an
> >> example, if such a feature were to actually be made I'd think you'd
> >> define a javascript function which returned a single groupping k to
> >> exist I
> > I think this is what the reduce function is for.
>
> No, I'm afraid it's not.
>
> The OP wants to calculate information across a group of related documents.
> CouchDB does not guarantee that all the related documents will be passed to
> the reduce function at the same time. It may pass documents (d1,d2,d3) to
> the reduce function to generate Rx, then pass (d4,d5,d6) to the reduce
> function to generate Ry, then (d7,d8,d9) to generate Rz, then pass
> (Rx,Ry,Rz) to the re-reduce function to generate the final R value.
>
> If the values sharing the key were e.g. d3,d4 then you won't be able to
> process them together, as they would not be presented to the reduce
> function
> at the same time.
>
> Using a grouped reduce query is better (i.e. group=true), but a large set
> of
> documents sharing the same group key are still likely to be split into
> several reductions with a re-reduce. The OP was talking about ~100
> documents
> sharing this key, and so they may well be split this way.
>
> Furthermore, CouchDB optimises its reductions by storing the reduced value
> for all the documents within the same Btree node. For example, suppose you
> have
>
>   +-------------+  +-------------+  +-------------+
>   | d1 d2 d3 Rx |  | d4 d5 d6 Ry |  | d7 d8 d9 Rz |
>   +-------------+  +-------------+  +-------------+
>
> Then you make a reduce query for the key range which includes documents d2
> to d8 inclusive (or a grouped query where d2 to d8 share the same group
> key). CouchDB will calculate:
>
>  R1 = Reduce(d2,d3)
>  R2 = Reduce(d7,d8)
>  R  = Rereduce(R1,Ry,R2)
>
> That is: the already-reduced value of Ry=Reduce(d4,d5,d6) is reused without
> recomputation. So the reduce function doesn't see documents d4 to d6 again.
>
> So in summary: you cannot rely on the reduce function to be able to process
> adjacent documents. You *must* do this sort of processing client-side.
>
> HTH,
>
> Brian.
>



-- 
Yours sincerely

Jack Su

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message