incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damien Katz <>
Subject Re: Ordering by values calculated by a map/reduce
Date Mon, 01 Dec 2008 19:38:26 GMT

On Dec 1, 2008, at 3:02 AM, Antony Blakey wrote:

> On 01/12/2008, at 6:09 PM, Ben Bangert wrote:
>> I want to solve what I thought was a fairly simple problem, though  
>> unfortunately it seems to be rather tricky. I asked on the IRC  
>> channel, and got some good input, but neither really seemed like a  
>> very good solution.
>> The problem:
>> I want to allow users to rate things. They do this very very  
>> frequently, so I can't store it in the actual document being rated,  
>> so I have Rating documents. It's easy enough to write a map/reduce  
>> that gives me the computed average rating for a given document,  
>> however, it seems to be impossible to get a listing of the highest  
>> rated documents, as I can only get the computed rating for a  
>> document one at a time.
>> The possible solutions:
>> - Buffer rating additions, then at a later time, run through them  
>> and calculate the new average rating, store it in the document as  
>> computed_rating, so I can order on that in the key
>> - Cron a job that goes in and looks for new rating every 5 mins or  
>> whatever, and then does the same as the previous solution by  
>> storing it in a computed_rating field
>> I'm not a fan of either of these, because #1 means if my webapp  
>> hiccups, I lose ratings, and #2 is just a pain to keep sweeping the  
>> db for new Rating documents then going through updating all the  
>> documents.
>> Is there really no other solutions that don't require me to store  
>> the computed rating in the doc itself? There's no way I can perhaps  
>> order on the value from the map/reduce, rather than only being able  
>> to order on the key?
> I use an external query handler to solve problems that don't fit map/ 
> reduce.
> I've modified Paul Davis's _external handler to pass the current  
> update_seq whenever an external query is made. In the external  
> process (ruby in my case) I maintain a SQLite database that I can  
> use for queries that Couch isn't suited for. Whenever a query comes  
> in, I compare the supplied update_seq with one stored in my sqlite  
> db (of course I cache that in memory as long as the process lives).  
> If the sqlite db is out of date, then I do a _all_docs_by_seq and  
> update sqlite, including the update_seq record, before doing the  
> (SQL) query and responding with a JSON document in the same format  
> as Couch would.

> I can delete the sqlite db at any time (well, while the process is  
> stopped) because it will get recreated/updated when a query comes in  
> as necessary. This works with replication.
> This system has the same lazy-update characteristics as Couch views,  
> and has the additional advantage that you can do in-memory caching  
> in the external process which depending on your update frequency,  
> means you rarely hit the db.

Excellent! What you have created, I think, is a custom view engine. I  
hope to see more of this (Lucene FT indexing support is another  
example), maybe you could write up how you did it? Can review the code  
or design first, if it helps.


View raw message