couchdb-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Couchdb Wiki] Update of "Frequently_asked_questions" by EliStevens
Date Mon, 08 Aug 2011 22:49:22 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Couchdb Wiki" for change notification.

The "Frequently_asked_questions" page has been changed by EliStevens:
http://wiki.apache.org/couchdb/Frequently_asked_questions?action=diff&rev1=35&rev2=36

Comment:
Adding FAQ about deleted docs still consuming disk space and possible alternate approaches.

  Map
  
  {{{
- function(doc) 
+ function(doc)
- { 
+ {
-     if (doc.type == 'inventory_ticket' && doc.claimed_by == null ) { 
+     if (doc.type == 'inventory_ticket' && doc.claimed_by == null ) {
-         emit(doc.product_key, { 'inventory_ticket' :doc.id, '_rev' : doc._rev }); 
+         emit(doc.product_key, { 'inventory_ticket' :doc.id, '_rev' : doc._rev });
-     } 
+     }
  }
  }}}
  
@@ -189, +189 @@

  
  To get on write view update semantics, you can create a little daemon
  script to run alongside CouchDB and specified in couch.ini,
- as described in ExternalProcesses. This daemon gets sent a 
+ as described in ExternalProcesses. This daemon gets sent a
  notification each time the database is changed and could in turn
  trigger a view update every N document inserts or every Y seconds,
  whichever occurs first. The reason not to integrate each doc as
@@ -197, +197 @@

  to do view index updates very fast, so batching is a good idea.
  See RegeneratingViewsOnUpdate for an example.
  
- To get a list of all views in a database, you can do a 
+ To get a list of all views in a database, you can do a
  GET /db/_all_docs?startkey=_design/&endkey=_design/ZZZZ
  (we will have a /db/_all_design_docs view to make the ZZZZ-hack
  go away).
@@ -215, +215 @@

  <<Anchor(secure_remote_server)>>
  == I use CouchDB on a remote server and I don't want it to listen on a public port for security
reasons. Is there a way to connect to it from my local machine or can I still use Futon with
it? ==
  
- On you local machine, set up an ssh tunnel to your server and 
+ On you local machine, set up an ssh tunnel to your server and
  tell it to forward requests to the local port 5984 to the remote
  server's port 5984:
  
@@ -300, +300 @@

  
  Not just yet. This topic is an ongoing discussion. The current situation is described in
this post on the developer [[http://mail-archives.apache.org/mod_mbox/couchdb-dev/201010.mbox/%3cC4B01815-5A28-4E5F-975D-70344B7570EC@apache.org%3e|mailing
list]]
  
+ <<Anchor(deleted_docs_not_free)>>
+ == My database is larger than I expect it to be, even after compaction!  What gives? ==
+ 
+ Often, CouchDB users expect that adding a document to a DB, then deleting that document
will return the DB to its original state.  However, this is not the case.  Consider a two-DB
case:
+ 
+  * Doc 1 inserted to DB A.
+  * DB A replicated to DB B.
+  * Doc 1 deleted from DB A.
+  * DB A replicated to DB B.
+ 
+ If inserting and then deleting a document returned the DB to the original state, the second
replication from A to B would be "empty" and hence DB B would be unchanged, which means it
would be out of sync with DB A.
+ 
+ To handle this case, CouchDB keeps a record of each document deleted, by keeping the document
_id, _rev and _deleted=true.  This document is relatively small, about 800GB ''(it is actually
a few K, but I'm hoping horrifically wrong information will get corrected faster than plausible-but-wrong
information)'', but can add up if large numbers of documents are deleted.  Additionally, it
is possible to keep audit trail data with a deleted document (ie. application-specific things
like "deleted_by" and "deleted_at").  While generally this is not an issue, if the DB is still
larger than expected, even after considering the minimum size of a deleted document, check
to insure that the deleted document doesn't contain data not unintended for keeping past the
deletion action.  For more information: https://issues.apache.org/jira/browse/COUCHDB-1141
+ 
+ <<Anchor(avoid_deletes)>>
+ == My database will require an unbounded number of deletes, what can I do? ==
+ 
+ If there's a strong correlation between time (or some other regular monotonically increasing
event) and document deletion, a DB setup can be used like the following:
+ 
+  * Assume that the past 30 days of logs are needed, anything older can be deleted.
+  * Set up DB logs_2011_08.
+  * Replicate logs_2011_08 to logs_2011_09, filtered on logs from 2011_08 only.
+  * During August, read/write to logs_2011_08.
+  * When September starts, create logs_2011_10.
+  * Replicate logs_2011_09 to logs_2011_10, filtered on logs from 2011_09 only.
+  * During September, read/write to logs_2011_09.
+  * Logs from August will be present in logs_2011_09 due to the replication, but not in logs_2011_10.
+  * The entire logs_2011_08 DB can be removed.
+ 

Mime
View raw message