incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Smith <...@iriscouch.com>
Subject Re: Having purge problems
Date Sun, 04 Dec 2011 03:41:09 GMT
On Sat, Dec 3, 2011 at 11:02 PM, Robert Newson <rnewson@apache.org> wrote:
> I can't mention _purge without reminding everyone that it exists only
> for removal of data that should not have been stored in the first
> place (like sensitive passwords, etc). It is not a mechanism to use
> lightly as it breaks eventual consistency, is only lightly tested, and
> will often cause full view rebuilds.

Hi, Bob. Since this is the user list, may I pull this thread into a tangent?

tl;dr = There is one exception, purging very old deleted docs; link to
an awesome write-up at the end.

Where I agree:

* Purge is not a mechanism to use lightly
* Purge breaks eventual consistency
* Purge is only lightly tested

Where I sort-of agree:

* Purge will "often" cause full view rebuilds. This is true in the
most general case, however it can be virtually eliminated in
production by writing careful code (or using a carefully-written
library).

Where I disagree:

* Purge exists only for removal of data which should not have been
stored in the first place (like sensitive passwords)

Let's break this down. The easy part is what to do with sensitive
passwords. Purge is not delete; purge removes a document from ever
having existed in the first place. Like Marty McFly in "The Dating
Game": http://www.youtube.com/watch?v=CC73uxAVfVY

Since purged documents cannot be replicated away, the best thing to do
about a sensitive password is to *change it*, so the change can
propagate. (Subsequent compaction on the Couch(es) will remove the
password from the disk--or at least from the filesystem.)

And speaking of compaction, your "never purge" advice is good; except
for deleted documents. Deleted documents never, ever, exit the .couch
file. CouchDB is relaxed. I should be able to create and delete
documents and expect reasonable post-compaction disk usage. If purge
is off-limits, certain usage styles of CouchDB produce ever-growing
.couch files, and ever-slowing _all_docs and _changes queries.

One might say, "just make a new database, with filtered replication."
Well, that is basically exactly what a purge does:

* It is to be done only rarely, and with care
* Some documents "never existed at all"
* All views are rebuilt

For this reason, I have written a procedure for purging very old
deleted documents in a production setting. I hope to actually
implement it one day; but you are right that it is needed rarely, if
at all, in the real world.

https://github.com/iriscouch/cqs#purging

-- 
Iris Couch

Mime
View raw message