couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wout Mertens <wout.mert...@gmail.com>
Subject Re: Proposal: Review DBs
Date Sun, 26 Apr 2009 21:26:33 GMT
Hi Adam,

On Apr 22, 2009, at 4:48 PM, Adam Kocoloski wrote:

> Hi Wout, thanks for writing this up.
>
> One comment about the map-only views:  I think you'll find that  
> Couch has already done a good bit of the work needed to support  
> them, too.  Couch maintains a btree for each design doc keyed on  
> docid that stores all the view keys emitted by the maps over each  
> document.  When a document is updated and then analyzed, Couch has  
> to consult that btree, purge all the KVs associated with the old  
> version of the doc from each view, and then insert the new KVs.  So  
> the tracking information correlating docids and view keys is already  
> available.

See I did not know that :-) Although I should have guessed.

However, in the mail before this one I argued that it doesn't make  
sense to combine or chain map-only views since you can always write a  
map function that does it in one step. Do you agree?

You might also know the answer to this: is it possible to make the  
Review DB be a sort of view index on the current database? All it  
needs are JSON keys and values, no other fields.

> You'd still be left with the problem of generating unique docids for  
> the documents in the Review DB, but I think that's a problem that  
> needs to be solved.  The restriction to only MR views with no  
> duplicate keys across views seems too strong to me.

Well, since the Review DB is a local(*) hidden database that's handled  
a bit specially, I think the easiest is to assign _id a sequence  
number and create a default view that indexes the documents by doc.key  
(for updating the value for that key). There will never be contention  
and we're only interested in the key index.

(*)local: I'm assuming that views are not replicated and need to be  
recalculated for each CouchDB node. If they are replicated somehow, I  
think it would still work but we'd have to look at it a little more.

> With that said, I'd prefer to spend my time extending the view  
> engine to handle chainable MR workflows in a single shot.   
> Especially in the simple sort_by_value case it just seems like a  
> cleaner way to go about things.

Yes, that seems to be the gist of all repliers and I agree :-)

In a nutshell, I'm hoping that:
* A review is a new sort of view that has an "inputs" array in its  
definition.
* Only MR views are allowed as inputs, no KV duplication allowed.
* It builds a persistent index of the incoming views when those get  
updated.
* That index is then used to build the view index for the review when  
the review gets updated.
* I think I covered the most important algorithms needed to implement  
this in my original proposal.

Does this sound feasible? If so I'll update my proposal accordingly.

Wout.
Mime
View raw message