couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wout Mertens <wout.mert...@gmail.com>
Subject Re: Proposal: Review DBs
Date Sun, 26 Apr 2009 21:13:06 GMT
On Apr 23, 2009, at 2:52 AM, Paul Davis wrote:

> I'm still kicking around ideas on how I might approach such an
> implementation. There are some obvious downfalls that Wout has come up
> on. Specifically, some obvious implementations would place some
> arduous constraints on emitted keys (namely: uniqueness, type must be
> string, string must not start with an underscore).

I really would prefer the review DB to be, in fact, a view index, but  
since I don't know the code, I was trying to provide the ability to go  
either way.

As for the uniqueness, I don't know if that is such a heavy  
constraint. After all, it is M views that can emit multiple non-unique  
KV pairs. There is no value in running a second M(R) after M, since  
you can write any map_b(map_a()) as a new map_c(). Likewise the  
results of map_a()+map_b() can be obtained by a map_c() function.

Therefore, review DBs should only contain the results of MR views,  
that by definition deliver unique KV pairs. When combining 2 MR views  
you can have KV collisions, but in all cases the MRs could be  
rewritten to avoid collisions (simply postfix the keys that the M  
emits).

As for the _id of a result row: If the review db would simply be a  
view index, no _id is needed. If it has to be a db, the _id can simply  
be the string version of the JSON representation of the key (it will  
never start with a _ but it's quite wasteful) or some sequence number  
for the review db.

> So there are some things to contemplate, but I think the general idea
> is pretty solid. Also as we start putting some serious effort into
> clustering CouchDB there's the eyebrow raising aspect that if we
> persist to DB's we might be able to leverage a lot of that for some
> added awesomeness.

I feel the same way.

I've also been contemplating using this persistence for _temp_views. I  
think it can be done, given garbage collection on the temporary review  
dbs. Then you could use the CouchDB view server farm to calculate  
multi-dimensional views (all documents with tag A AND tag B AND  
younger than 2 days).

Wout.
Mime
View raw message