Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 44894 invoked from network); 5 Nov 2009 17:43:54 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 5 Nov 2009 17:43:54 -0000 Received: (qmail 17264 invoked by uid 500); 5 Nov 2009 17:43:53 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 17191 invoked by uid 500); 5 Nov 2009 17:43:53 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 17181 invoked by uid 99); 5 Nov 2009 17:43:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Nov 2009 17:43:52 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of truemped@googlemail.com designates 72.14.220.153 as permitted sender) Received: from [72.14.220.153] (HELO fg-out-1718.google.com) (72.14.220.153) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Nov 2009 17:43:41 +0000 Received: by fg-out-1718.google.com with SMTP id 16so137928fgg.5 for ; Thu, 05 Nov 2009 09:43:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:received:received:received:content-type :mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; bh=mrddRC64XYeI+gy8m2f2YPqL7X9HQEW/Ow37WelynaE=; b=O9cMGlzZ9Bktwsm7iOLHuaVseLzd15aNe1xoVRjubm8F9JRfXaDjBWZKmpSHIs0mbi Q6HjttCQaMVxn3yQejjB5SBlwfMGfal9xuK4qffBvR+SF7Itl+stx4SCFIssF04Gpseq 1i1A9wAOOKC0B4mnUcI2npykdvp9bNJ/YgGJI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; b=JAFN6sRW/2mQGgeB62cXa45Q10JTupdMC9/Ep6U4eBZBm6jwTq5GcONGOifc7sZBMP YpGNOSeZPIRVApACJMOXxqawfHtn+0NRUh9NWJuwDwUE94mTIm9h6qXqIFV0d/jAwhcr w54dmRf/YXBWgBRy0Zt5CBd+YZdM7DRg3EWGI= Received: by 10.87.73.4 with SMTP id a4mr5038136fgl.76.1257443000036; Thu, 05 Nov 2009 09:43:20 -0800 (PST) Received: from hence22.org ([85.131.190.91]) by mx.google.com with ESMTPS id e11sm3908207fga.17.2009.11.05.09.43.19 (version=TLSv1/SSLv3 cipher=RC4-MD5); Thu, 05 Nov 2009 09:43:19 -0800 (PST) Received: from vandusen.neofonie.priv (crt-01-tr.neofonie.de [91.213.91.28]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by hence22.org (Postfix) with ESMTPSA id 05F65389EC for ; Thu, 5 Nov 2009 18:43:06 +0100 (CET) Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes Mime-Version: 1.0 (Apple Message framework v1076) Subject: Re: Sorting items by number of votes From: Daniel Truemper In-Reply-To: <254CF962-CC02-4FEA-9A60-ECF2E71CA2D2@devonweller.com> Date: Thu, 5 Nov 2009 18:43:17 +0100 Content-Transfer-Encoding: 7bit Message-Id: References: <3E00739B-2973-4A13-AF4D-9992A39A6527@devonweller.com> <254CF962-CC02-4FEA-9A60-ECF2E71CA2D2@devonweller.com> To: user@couchdb.apache.org X-Mailer: Apple Mail (2.1076) X-Virus-Checked: Checked by ClamAV on apache.org Hi, > Using a list is an interesting idea. Although, I suspect that > method would become inefficient for things like "give me the 10 > resources with the most votes" when there are 10,000 resources in > the database. hm, I don't think that a list is the appropriate way to go here since a list is based on a view again (AFAIK). > I think my solution will be to create a map reduce which just counts > the votes by resource_id. And then use that information to do a > bulk request for the top 10 documents by ID. > Regarding the example in the wiki: > > As a new user, I just followed an example from the wiki: http://wiki.apache.org/couchdb/View_Snippets#aggregate_sum > > Is that example an incorrect way of using a CouchDB view? Should it > be removed? No, it is not. If you step through it you will notice what happens. Here is the code again: // Map function function(doc) { if (doc.Type == "customer") { emit([doc._id, 0], doc); } else if (doc.Type == "order") { emit([doc.customer_id, 1], doc); } } // Reduce function // Only produces meaningful output.customer_details if group_level >= 1 function(keys, values, rereduce) { var output = {}; if (rereduce) { for (idx in values) { if (values[idx].total !== undefined) { output.total += values[idx].total; } else if (values[idx].customer_details !== undefined) { output.customer_details = values[idx].customer_details; } } } else { for (idx in values) { if (values[idx].Type == "customer") output.customer_details = doc; else if (values[idx].Type == "order") output.total += 1; } } return output; } 1. The map function emits a compound key that is [doc._id, 0] for customers and [doc.customer_id, 1] for orders. So in the case of 2 customers with 3 orders each the resulting tree looks like the following: key, value [ 1, 0 ], customer1 [ 1, 1 ], customer1order [ 1, 1 ], customer1order [ 1, 1 ], customer1order [ 2, 0 ], customer2 [ 2, 1 ], customer2order [ 2, 1 ], customer2order [ 2, 1 ], customer2order with customer1 having id 1 and customer2 having id 2. 2. In the reduce function are 2 different kinds of reduction. A small note: you should call the view with group_level=1 otherwise you won't get aggregated results! The basic operation of the reduce function is the second part: for (idx in values) { if (values[idx].Type == "customer") output.customer_details = doc; else if (values[idx].Type == "order") output.total += 1; } What happens is that it iterates through all key/value pairs where the key begins with [1]. So: [ 1, 0 ], customer1 [ 1, 1 ], customer1order [ 1, 1 ], customer1order [ 1, 1 ], customer1order So for the first entry (a customer) the output object is filled with the customer doc. For all orders a counter inside the output object is increased. So in the end the following would be returned: { customer_details = customer1, total = 3 } So: input is a list of 4 values, the output only contains 1. 3. The second case of the reduce function deals with the phase where there are so many orders that CouchDB internally stores mid-values of not all orders with customers in the same tree bucket. Example: top / \ / \ / \ / \ / \ a b c d a = [ 1, 0 ], customer1 b = [ 1, 1 ], customer1order c = [ 1, 1 ], customer1order d = [ 1, 1 ], customer1order So if CouchDB internally stores mid-values of a+b and c+d you will have the two output objects: { customer_details = customer1, total = 1 } { total = 2 } These two values are now used to rereduce: for (idx in values) { if (values[idx].total !== undefined) { output.total += values[idx].total; } else if (values[idx].customer_details !== undefined) { output.customer_details = values[idx].customer_details; } } So in the end you again have the above value: { customer_details = customer1, total = 3 } >>> Here is my reduce function: >>> >>> function(keys, values, rereduce) { >>> var score = 0; >>> var output_doc = {}; >>> >>> for (var i=0; i < values.length; i++) { >>> if (values[i].type == 'vote') { >>> ++score; >>> } else if (values[i].type == 'resource') { >>> output_doc = values[i]; >>> } >>> } >>> >>> return {doc:output_doc, score:score}; >>> } First I think you need to implement the rereduce phase, otherwise you will get wrong numbers with large amounts of data. From looking at your reduce function I seem to remember that the error message is based on some byte length difference between the incoming and outgoing value of the reduce function. So if the incoming values only contain one very large resource document and several smaller votes, the fact that you are returning the resource document might get in your way here. So I think it would be better if you would only store the document id and get that document in a separate call to the db. And a little side note: at the moment you cannot order the view based on the value! Ordering is only done by keys! You could however write another type of document (VoteCount) into your database containing the resource and the number of votes. Then emitting as key something like [ #votes, resource ] will give you an ordered view based on the number of votes. You could trigger the view update from the client each time a vote is made (i.e. add a vote document, call the view, update the VoteCount document and call the new view to get the ordered votes). You could also do this automatically on the CouchDB using update notifiers and simple Bash/ Python/Perl/whatever scripts... HTH Daniel