incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Blakey <antony.bla...@gmail.com>
Subject Re: couchdb (_external consistency issues and proposals)
Date Sun, 21 Dec 2008 04:10:57 GMT
(Posted to -dev because it has some development issues)

This is wrong BTW:

>        elsif doc["Type"] == "user"
>          doc["Roles"] && doc["Roles"].each do |r|
>            db.execute("replace into links values (?, ?, ?)",  
> db_name, doc_id, r);
>          end

because it doesn't handle modifications correctly. In my production  
code I do this:

   db.execute("delete from links where db = ? and src = ?", db_name,  
doc_id);
   doc["Roles"] && doc["Roles"].each do |r|
     db.execute("insert into links values (?, ?, ?)", db_name, doc_id,  
r);
   end

i.e. always delete and recreate the derived document. You can do  
incremental updates by reading from your indexes before updating. You  
cannot reliably get the previous rev (for differencing) because it may  
not exist.

My code also doesn't handle a database being deleted and then re- 
created - the _external will think it has valid records, but they  
belong to a previous database. You could do that through  
notifications, but once again I think it needs to be synchronous if  
you want to reason about it. A likely-to-work-most-of-the-time  
solution would be to detect update_seq < stored_update_seq. A better  
solution would be for each db to have a UUID, so that you don't have  
to rely on the name as the identity.

Also, if your _external doesn't get triggered for a long time, and  
while it's 'dormant' a document is deleted and the db is compacted,  
you could miss deletions. One solution to that is that every _external  
needs to be notified (synchronously) before a compaction so that it  
can update to the update_seq of the MVCC snapshot that the compaction  
will operate against. IMO a better solution is to have two UUID's for  
the database - one is per database, and one is 'per compaction'. Thus  
an external will know if it needs to revalidate all the documents it  
has indexed to check for missed deletions updates. You could just have  
a per-compaction UUID, which would change if a db was deleted and then  
created, this triggering the same codepath, but this is a lot more  
expensive than knowing that the entire db

Finally, note that this external operates for *every* database,  
whereas you may want to enable and configure it using a design  
document. Thus your external should always monitor updated design  
documents and check for enablement. You can record the configuration  
in the database (and cache it in the _external) and just ignore all  
other changes. Personally I don't bother because the lazy-creation  
means that no work is done unless I do an _external query, so  
databases which don't get queried, don't incur a cost, and I have no  
configuration data.

That's another reason to prefer a passive UUID-based identity scheme  
for db-create/delete and compaction detection rather than a  
notification system.

It would be good if each DB had two UUIDs, one per-db and one per- 
compaction i.e. changed in the MVCC snapshot during a compaction, and  
that these be provided to every _external request.

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

If at first you don’t succeed, try, try again. Then quit. No use being  
a damn fool about it
   -- W.C. Fields


Mime
View raw message