couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Stott <nrst...@gmail.com>
Subject Re: Sorting items by number of votes
Date Thu, 05 Nov 2009 18:18:33 GMT
Why would a list be any less efficient than sorting on the client side?  You
could even use the view API to constrain the results sent to the list where
appropriate.

On Thu, Nov 5, 2009 at 11:43 AM, Daniel Truemper <truemped@googlemail.com>wrote:

> Hi,
>
>
>  Using a list is an interesting idea.  Although, I suspect that method
>> would become inefficient for things like "give me the 10 resources with the
>> most votes" when there are 10,000 resources in the database.
>>
> hm, I don't think that a list is the appropriate way to go here since a
> list is based on a view again (AFAIK).
>
>
>  I think my solution will be to create a map reduce which just counts the
>> votes by resource_id.  And then use that information to do a bulk request
>> for the top 10 documents by ID.
>>
>
>  Regarding the example in the wiki:
>>
>> As a new user, I just followed an example from the wiki:
>> http://wiki.apache.org/couchdb/View_Snippets#aggregate_sum
>>
>> Is that example an incorrect way of using a CouchDB view?  Should it be
>> removed?
>>
> No, it is not. If you step through it you will notice what happens. Here is
> the code again:
>
> // Map function
> function(doc) {
>  if (doc.Type == "customer") {
>    emit([doc._id, 0], doc);
>  } else if (doc.Type == "order") {
>    emit([doc.customer_id, 1], doc);
>  }
> }
>
> // Reduce function
> // Only produces meaningful output.customer_details if group_level >= 1
> function(keys, values, rereduce) {
>  var output = {};
>  if (rereduce) {
>    for (idx in values) {
>      if (values[idx].total !== undefined) {
>        output.total += values[idx].total;
>      } else if (values[idx].customer_details !== undefined) {
>        output.customer_details = values[idx].customer_details;
>      }
>    }
>  } else {
>    for (idx in values) {
>      if (values[idx].Type == "customer") output.customer_details = doc;
>      else if (values[idx].Type == "order") output.total += 1;
>    }
>  }
>  return output;
> }
>
>
>
> 1. The map function emits a compound key that is [doc._id, 0] for customers
> and [doc.customer_id, 1] for orders. So in the case of 2 customers with 3
> orders each the resulting tree looks like the following:
>
> key, value
>
> [ 1, 0 ], customer1
> [ 1, 1 ], customer1order
> [ 1, 1 ], customer1order
> [ 1, 1 ], customer1order
> [ 2, 0 ], customer2
> [ 2, 1 ], customer2order
> [ 2, 1 ], customer2order
> [ 2, 1 ], customer2order
>
> with customer1 having id 1 and customer2 having id 2.
>
>
>
>
>
> 2. In the reduce function are 2 different kinds of reduction. A small note:
> you should call the view with group_level=1 otherwise you won't get
> aggregated results!
>
> The basic operation of the reduce function is the second part:
>
>    for (idx in values) {
>      if (values[idx].Type == "customer") output.customer_details = doc;
>      else if (values[idx].Type == "order") output.total += 1;
>    }
>
> What happens is that it iterates through all key/value pairs where the key
> begins with [1]. So:
>
> [ 1, 0 ], customer1
> [ 1, 1 ], customer1order
> [ 1, 1 ], customer1order
> [ 1, 1 ], customer1order
>
> So for the first entry (a customer) the output object is filled with the
> customer doc. For all orders a counter inside the output object is
> increased. So in the end the following would be returned:
>
> {
>  customer_details = customer1,
>  total = 3
> }
>
> So: input is a list of 4 values, the output only contains 1.
>
>
>
>
>
> 3. The second case of the reduce function deals with the phase where there
> are so many orders that CouchDB internally stores mid-values of not all
> orders with customers in the same tree bucket.
>
> Example:
>
>           top
>        /         \
>      / \          / \
>     /   \        /   \
>  a     b     c     d
>
> a = [ 1, 0 ], customer1
> b = [ 1, 1 ], customer1order
> c = [ 1, 1 ], customer1order
> d = [ 1, 1 ], customer1order
>
> So if CouchDB internally stores mid-values of a+b and c+d you will have the
> two output objects:
>
> {
>  customer_details = customer1,
>  total = 1
> }
>
> {
>  total = 2
> }
>
> These two values are now used to rereduce:
>
>    for (idx in values) {
>      if (values[idx].total !== undefined) {
>        output.total += values[idx].total;
>      } else if (values[idx].customer_details !== undefined) {
>        output.customer_details = values[idx].customer_details;
>      }
>    }
>
> So in the end you again have the above value:
>
> {
>  customer_details = customer1,
>  total = 3
>
> }
>
>
>
>  Here is my reduce function:
>>>>
>>>> function(keys, values, rereduce) {
>>>>     var score = 0;
>>>>     var output_doc = {};
>>>>
>>>>     for (var i=0; i < values.length; i++) {
>>>>             if (values[i].type == 'vote') {
>>>>                     ++score;
>>>>             } else if (values[i].type == 'resource') {
>>>>                     output_doc = values[i];
>>>>             }
>>>>     }
>>>>
>>>>     return {doc:output_doc, score:score};
>>>> }
>>>>
>>>
> First I think you need to implement the rereduce phase, otherwise you will
> get wrong numbers with large amounts of data.
>
> From looking at your reduce function I seem to remember that the error
> message is based on some byte length difference between the incoming and
> outgoing value of the reduce function. So if the incoming values only
> contain one very large resource document and several smaller votes, the fact
> that you are returning the resource document might get in your way here. So
> I think it would be better if you would only store the document id and get
> that document in a separate call to the db.
>
>
>
> And a little side note: at the moment you cannot order the view based on
> the value! Ordering is only done by keys!
>
> You could however write another type of document (VoteCount) into your
> database containing the resource and the number of votes. Then emitting as
> key something like [ #votes, resource ] will give you an ordered view based
> on the number of votes. You could trigger the view update from the client
> each time a vote is made (i.e. add a vote document, call the view, update
> the VoteCount document and call the new view to get the ordered votes). You
> could also do this automatically on the CouchDB using update notifiers and
> simple Bash/Python/Perl/whatever scripts...
>
> HTH
> Daniel
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message