couchdb-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Couchdb Wiki] Update of "Purge_Documents" by RobertNewson
Date Fri, 02 Mar 2012 15:14:25 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Couchdb Wiki" for change notification.

The "Purge_Documents" page has been changed by RobertNewson:
http://wiki.apache.org/couchdb/Purge_Documents?action=diff&rev1=1&rev2=2

- When you [[HTTP_Document_API#DELETE|delete a document]] the database will create new revision
of it which contains _id, _rev, _deleted fields. The reason of it to ensure that deletion
of document will be replicated to other databases (if you are using replication). If after
deletion you will [[Compaction#Database_Compaction|compact a database]], you will find that
the reference to the deleted document still exists. If you call [[HTTP_database_API#Changes|_changes]]
you will make sure about it. It will be only one revision available which we call '''deleted
mark revision'''. Again it is necessary for the replication integrity as it was mentioned
before.
+ So, you've included your credit card details, your mother's maiden name and the PIN's to
all your major credit cards in a CouchDB document by mistake. You'd like to undo this. Usually,
you can simply update the document, removing the confidential data, and then compact the database.
However, let's say you really messed up and included this secret information in the document's
id field. You remember that CouchDB will remember all the latest {id, rev} pairs it's ever
seen (so that replication can make all replicas eventually consistent). Are you paddling down
an unpleasant stream with no means of steering? Fortunately not! You can purge;
  
- The '''_purge''' operation removes the reference to the deleted document from the database.
To perform a purge operation you must send a request including the JSON of the document IDs
that you want to purge and revisions for purging. For example:
+ The '''_purge''' operation removes the reference to the deleted document from the database.
To perform a purge operation you must send a JSON object, where the keys are the ID's to purge
and the value is a list of the revisions to purge. For example:
  {{{
  POST /mydb/_purge
  Content-Type: application/json
@@ -27, +27 @@

  }
  }}}
  
+ After compaction, no trace, not even the _id value, will remain. If you have purged more
than one document between querying your views, you will find that they will rebuild from scratch.
This is because you have removed the information necessary to perform a correct incremental
update. Finally, if you are using _purge to recover space, you are almost certainly using
CouchDB inappropriately. The most common reason developers use _purge inappropriately is when
managing short-lived data (log entries, message queues, etc). A better remedy is to periodically
switch to a new database and delete the old one (once the entries in it have all expired).
- Notes:
-  * '''The purging is not replicated to other databases.''' Keeping deleted mark revision
is only way to guarantee the replication integrity. If you use purging you may break this.
There is only way to avoid this is check that revision of deleted documents is the same in
all replication targets and sources.
-  * There is no way to purge automatically all deleted documents. You have to send a _purge
request with all documents IDs and revisions.
-  * Purging documents does not remove the space used by them on disk. To fix this run a database
[[Compaction#Database_Compaction|compact]] and also compact views.
-  * There is no any reason to purge old revisions of existent documents. You can configure
[[HTTP_database_API#Accessing_Database-specific_options|_revs_limit]] to do it automatically.
  
- '''Warning!!!''' All notes above is trying to inform you that _purge should be used only
in extraordinary cases, for example you need to delete secure information after some mistake.
If you still need use _purge it may mean that you probably need to move some data into another
NoSQL or SQL databases.
- 

Mime
View raw message