couchdb-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Couchdb Wiki] Update of "Purge_Documents" by JensAlfke
Date Wed, 18 Jul 2012 18:01:23 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Couchdb Wiki" for change notification.

The "Purge_Documents" page has been changed by JensAlfke:
http://wiki.apache.org/couchdb/Purge_Documents?action=diff&rev1=3&rev2=4

Comment:
Added lots of explanatory text.

- So, you've included your credit card details, your mother's maiden name and the PIN's to
all your major credit cards in a CouchDB document by mistake. You'd like to undo this. Usually,
you can simply update the document, removing the confidential data, and then compact the database.
However, let's say you really messed up and included this secret information in the document's
id field. You remember that CouchDB will remember all the latest {id, rev} pairs it's ever
seen (so that replication can make all replicas eventually consistent). Are you paddling down
an unpleasant stream with no means of steering? Fortunately not! You can purge;
+ The '''_purge''' operation removes all references to the deleted revisions -- and their
parents -- from the database. (This is very different from a normal delete, which actually
''adds'' a "tombstone" revision.) In a sense it edits history, similarly to a Git "reset":
the revisions will no longer appear in the revision tree. It's as though the database had
never heard of them at all.
  
- The '''_purge''' operation removes the reference to the deleted document from the database.
To perform a purge operation you must send a JSON object, where the keys are the ID's to purge
and the value is a list of the revisions to purge. For example:
+ == Reasons To (And Not To) Purge ==
+ 
+ So, you've included your credit card details, your mother's maiden name and the PIN's to
all your major credit cards in a CouchDB document by mistake. You'd like to undo this. Usually,
you can simply update the document, removing the confidential data, and then compact the database.
However, let's say you really messed up and included this secret information in the document's
id field. You remember that CouchDB will remember all the latest {id, rev} pairs it's ever
seen (so that replication can make all replicas eventually consistent). Are you paddling down
an unpleasant stream with no means of steering? Fortunately not! You can purge.
+ 
+ If you are using _purge to recover space, you are almost certainly using CouchDB inappropriately.
The most common reason developers use _purge inappropriately is when managing short-lived
data (log entries, message queues, etc). A better remedy is to periodically switch to a new
database and delete the old one (once the entries in it have all expired).
+ 
+ == Eligibility For Purging ==
+ 
+ A revision parameter to _purge must be a ''leaf'' in the revision tree. This means it must
be the current revision, or one of the current conflicting revisions. This is because a revision
that has already been replaced by another is not a leaf node of the revision tree, so removing
it would break the integrity of the tree.
+ 
+ When a revision is purged, its ancestors are purged if possible. Ancestors will be kept
if necessary to preserve the integrity of the tree; this only happens if there have been conflicts
and they are either unresolved or haven't yet been compacted away.
+ 
+ == The _purge Command ==
+ 
+ To perform a purge operation you must send a JSON object, where the keys are the IDs to
purge and each value is a list of the revisions to purge. Typically you'd just specify the
current revision ID, which will purge the entire document unless there are conflicts. To purge
an entire document while it's in conflict, you need to send each conflicting revision ID.
+ 
+ For example:
  {{{
  POST /mydb/_purge
  Content-Type: application/json
@@ -25, +41 @@

     },
     "purge_seq" : 1
  }
+ 
+ The '''purge sequence number''' is simply a persistent per-database counter that is incremented
every time a _purge operation is performed. It's used internally to invalidate view indexes.
+ 
  }}}
  
- After compaction, no trace, not even the _id value, will remain. If you have purged more
than one document between querying your views, you will find that they will rebuild from scratch.
This is because you have removed the information necessary to perform a correct incremental
update. Finally, if you are using _purge to recover space, you are almost certainly using
CouchDB inappropriately. The most common reason developers use _purge inappropriately is when
managing short-lived data (log entries, message queues, etc). A better remedy is to periodically
switch to a new database and delete the old one (once the entries in it have all expired).
+ == Side Effects ==
  
- Remember that if the document has been replicated, then the same operation would need to
be applied to all replicas as well, ensuring that replication is stopped during this operation,
to avoid it being replicated back again.
+ If you have purged more than one document between querying your views, you will find that
they will rebuild from scratch. This is because you have removed the information necessary
to perform a correct incremental update.
  
+ If the purged revisions still exist in a another replica of the database, a replication
with that database will pull them over again and restore them. To globally remove the revisions,
the purge needs to be performed on all the replicas as well, ensuring that replication is
stopped during this operation, to avoid them being replicated back again.
+ 

Mime
View raw message