couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Dionne <dio...@dionne-associates.com>
Subject Re: Having purge problems
Date Sun, 04 Dec 2011 12:35:22 GMT
The semantics of delete and purge are a little confusing. 

What purge does is delete,  and delete doesn't delete at all, it merely marks a revision as
deleted. Sure this stems from the nature of MVCC but I'm wondering if it wouldn't be better
for compaction to clean up deleted revisions.

Telling the user that to really delete some bad data, the best way is to change it and then
compact isn't quite right.

Moreover you can delete a doc, replicate the db to another, create the doc anew on the target
and then "open_revs=all" will show version 1- as the new one and version 2- as deleted. Of
course these seq ids are not supposed to be interpreted as ordered but it looks odd.

I would imagine allowing compact to clean up deletions would break something else but it seems
like a clearer way to do things from the user's perspective.

Cheers,

Bob


On Dec 3, 2011, at 10:41 PM, Jason Smith wrote:

> On Sat, Dec 3, 2011 at 11:02 PM, Robert Newson <rnewson@apache.org> wrote:
>> I can't mention _purge without reminding everyone that it exists only
>> for removal of data that should not have been stored in the first
>> place (like sensitive passwords, etc). It is not a mechanism to use
>> lightly as it breaks eventual consistency, is only lightly tested, and
>> will often cause full view rebuilds.
> 
> Hi, Bob. Since this is the user list, may I pull this thread into a tangent?
> 
> tl;dr = There is one exception, purging very old deleted docs; link to
> an awesome write-up at the end.
> 
> Where I agree:
> 
> * Purge is not a mechanism to use lightly
> * Purge breaks eventual consistency
> * Purge is only lightly tested
> 
> Where I sort-of agree:
> 
> * Purge will "often" cause full view rebuilds. This is true in the
> most general case, however it can be virtually eliminated in
> production by writing careful code (or using a carefully-written
> library).
> 
> Where I disagree:
> 
> * Purge exists only for removal of data which should not have been
> stored in the first place (like sensitive passwords)
> 
> Let's break this down. The easy part is what to do with sensitive
> passwords. Purge is not delete; purge removes a document from ever
> having existed in the first place. Like Marty McFly in "The Dating
> Game": http://www.youtube.com/watch?v=CC73uxAVfVY
> 
> Since purged documents cannot be replicated away, the best thing to do
> about a sensitive password is to *change it*, so the change can
> propagate. (Subsequent compaction on the Couch(es) will remove the
> password from the disk--or at least from the filesystem.)
> 
> And speaking of compaction, your "never purge" advice is good; except
> for deleted documents. Deleted documents never, ever, exit the .couch
> file. CouchDB is relaxed. I should be able to create and delete
> documents and expect reasonable post-compaction disk usage. If purge
> is off-limits, certain usage styles of CouchDB produce ever-growing
> .couch files, and ever-slowing _all_docs and _changes queries.
> 
> One might say, "just make a new database, with filtered replication."
> Well, that is basically exactly what a purge does:
> 
> * It is to be done only rarely, and with care
> * Some documents "never existed at all"
> * All views are rebuilt
> 
> For this reason, I have written a procedure for purging very old
> deleted documents in a production setting. I hope to actually
> implement it one day; but you are right that it is needed rarely, if
> at all, in the real world.
> 
> https://github.com/iriscouch/cqs#purging
> 
> -- 
> Iris Couch


Mime
View raw message