couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wout Mertens <wout.mert...@gmail.com>
Subject Re: Proposal: Review DBs
Date Mon, 27 Apr 2009 22:43:57 GMT
On Apr 27, 2009, at 5:20 AM, Chris Anderson wrote:

> On Apr 26, 2009, at 2:26 PM, Wout Mertens <wout.mertens@gmail.com>  
> wrote:
>>
>>> You'd still be left with the problem of generating unique docids  
>>> for the documents in the Review DB, but I think that's a problem  
>>> that needs to be solved.  The restriction to only MR views with no  
>>> duplicate keys across views seems too strong to me.
>>
>> Well, since the Review DB is a local(*) hidden database that's  
>> handled a bit specially, I think the easiest is to assign _id a  
>> sequence number and create a default view that indexes the  
>> documents by doc.key (for updating the value for that key). There  
>> will never be contention and we're only interested in the key index.
>
> We discussed this a little at CouchHack and I argued that the  
> simplest solution is actually good for a few reasons.
>
> The simple solution: provide a mechanism to copy the rows of a  
> grouped reduce function to a new database.

Ok, the problems I see with that though are:
- How to assign _ids to the rows
- Separate design doc needed for each DB
   - Spreads application logic
   - All data not available from one parent URL a la CouchApps
- Namespace pollution, all these utility DBs

Or do you mean this as a single-shot data dump? Couldn't that get  
quite expensive, storage wise?

> Good because it is most like Hadoop/Google style map reduce. In that  
> paradigm, the output of a map/reduce job is not incremental, and it  
> is persisted in a way that allows for multiple later reduce stages  
> to be run on it. It's common in Hadoop to chain many m/r stages, and  
> to try a few iterations of each stage while developing code.

Hmmm. If a hidden DB/view index is used, then the same function  
hashing techniques will work to decide which index to use for  
intermediate queries. I see no functional difference here.

> I like this also because it provides the needed functionality  
> without adding any new primitives to CouchDB.

But how would that mechanism be used if there's no new primitives? If  
CouchDB would allow an extra field "inputs" on the view definitions,  
that's it as far as user-visible changes go in the current thinking  
for review DBs.

> The only downside of this approach is that it is not incremental.  
> I'm not sure that incremental chainability has much promise, as the  
> index management could be a pain, especially if you have branching  
> chains.

Hmmm, I think that I showed that it needn't be. Any update to a view  
would trigger review index updates for all the views that have that  
view as input. Subsequent updates of those views then get propagated  
onwards in the same fashion. Nothing painful...

If you want the latest info, first update the input views and then the  
review view.

> Another upside is that by reducing to a db, you give the user power  
> to do things like use replication to merge multiple data sets before  
> applying more views.

That's true... And I suppose it would be very useful in that case. I  
think there's room for both approaches perhaps?

Wout.
Mime
View raw message