couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Blakey <antony.bla...@gmail.com>
Subject Re: Ordering by values calculated by a map/reduce
Date Mon, 01 Dec 2008 08:02:19 GMT

On 01/12/2008, at 6:09 PM, Ben Bangert wrote:

> I want to solve what I thought was a fairly simple problem, though  
> unfortunately it seems to be rather tricky. I asked on the IRC  
> channel, and got some good input, but neither really seemed like a  
> very good solution.
>
> The problem:
>
> I want to allow users to rate things. They do this very very  
> frequently, so I can't store it in the actual document being rated,  
> so I have Rating documents. It's easy enough to write a map/reduce  
> that gives me the computed average rating for a given document,  
> however, it seems to be impossible to get a listing of the highest  
> rated documents, as I can only get the computed rating for a  
> document one at a time.
>
> The possible solutions:
> - Buffer rating additions, then at a later time, run through them  
> and calculate the new average rating, store it in the document as  
> computed_rating, so I can order on that in the key
> - Cron a job that goes in and looks for new rating every 5 mins or  
> whatever, and then does the same as the previous solution by storing  
> it in a computed_rating field
>
> I'm not a fan of either of these, because #1 means if my webapp  
> hiccups, I lose ratings, and #2 is just a pain to keep sweeping the  
> db for new Rating documents then going through updating all the  
> documents.
>
> Is there really no other solutions that don't require me to store  
> the computed rating in the doc itself? There's no way I can perhaps  
> order on the value from the map/reduce, rather than only being able  
> to order on the key?

I use an external query handler to solve problems that don't fit map/ 
reduce.

I've modified Paul Davis's _external handler to pass the current  
update_seq whenever an external query is made. In the external process  
(ruby in my case) I maintain a SQLite database that I can use for  
queries that Couch isn't suited for. Whenever a query comes in, I  
compare the supplied update_seq with one stored in my sqlite db (of  
course I cache that in memory as long as the process lives). If the  
sqlite db is out of date, then I do a _all_docs_by_seq and update  
sqlite, including the update_seq record, before doing the (SQL) query  
and responding with a JSON document in the same format as Couch would.

I can delete the sqlite db at any time (well, while the process is  
stopped) because it will get recreated/updated when a query comes in  
as necessary. This works with replication.

This system has the same lazy-update characteristics as Couch views,  
and has the additional advantage that you can do in-memory caching in  
the external process which depending on your update frequency, means  
you rarely hit the db.

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

What can be done with fewer [assumptions] is done in vain with more
   -- William of Ockham (ca. 1285-1349)




Mime
View raw message