couchdb-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Couchdb Wiki] Update of "Compaction" by skoegl
Date Tue, 17 Apr 2012 18:39:42 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Couchdb Wiki" for change notification.

The "Compaction" page has been changed by skoegl:
http://wiki.apache.org/couchdb/Compaction?action=diff&rev1=29&rev2=30

Comment:
add some info about automatic compaction - mostly from /etc/couchdb/default.ini

  <<TableOfContents(2)>>
  
  == Database Compaction ==
- 
  Compaction compresses the database file by removing unused sections created during updates.
Old revisions of documents are also removed from the database though a small amount of meta
data is kept for use in conflict during [[Replication|replication]]. The number of revisions
(default of 1000) can be configured using the [[HTTP_database_API#Accessing_Database-specific_options|_revs_limit
URL endpoint]], available since version 0.8-incubating.
  
  Compaction is manually triggered per database. Support for queued compaction of multiple
databases is planned. Please note that compaction will be run as a background task.
  
  === Example ===
- 
  Compaction is triggered by an HTTP POST request to the _compact sub-resource of your database.
On success, HTTP status 202 is returned immediately. Although the request body is not used
you must still specify "application/json" as Content-Type for the request.
  
  {{{
  curl -H "Content-Type: application/json" -X POST http://localhost:5984/my_db/_compact
  #=> {"ok":true}
  }}}
- 
  GET requesting your database base URL (see [[HTTP_database_API#Database_Information]]) gives
a hash of statuses that look like this:
  
  {{{
  curl -X GET http://localhost:5984/my_db
  #=> {"db_name":"my_db", "doc_count":1, "doc_del_count":1, "update_seq":4, "purge_seq":0,
"compact_running":false, "disk_size":12377, "instance_start_time":"1267612389906234", "disk_format_version":5}
  }}}
- 
  The compact_running key will be set to true during compaction.
  
  === Compaction of write-heavy databases ===
  It is not a good idea to attempt compaction on a database node that is near full capacity
for its write load. The problem is the compaction process may never catch up with the writes
if they never let up, and eventually it will run out of disk space.
  
- Compaction should be attempted when the write load is less than full capacity. Read load
won't affect its ability to complete, however. To have the least impact possible on clients,
the database remains online and fully functional to readers and writers. It is a design limitation
that database compaction can't complete when at capacity for write load. It may be reasonable
to schedule compactions during off-peak hours. 
+ Compaction should be attempted when the write load is less than full capacity. Read load
won't affect its ability to complete, however. To have the least impact possible on clients,
the database remains online and fully functional to readers and writers. It is a design limitation
that database compaction can't complete when at capacity for write load. It may be reasonable
to schedule compactions during off-peak hours.
  
- In a clustered environment the write load can be switched off for any node before compaction
and brought back up to date with replication once complete. 
+ In a clustered environment the write load can be switched off for any node before compaction
and brought back up to date with replication once complete.
  
  In the future, a single CouchDB node can be changed to stop or fail other updates if the
write load is too heavy for it to complete in a reasonable time.
  
+ == View compaction ==
+ [[Introduction_to_CouchDB_views|Views]] need compaction like databases. There is a compact
views feature introduced with CouchDB 0.10.0:
  
- == View compaction ==
- 
- [[Introduction_to_CouchDB_views|Views]] need compaction like databases. There is a compact
views feature introduced with CouchDB 0.10.0:
  {{{
  curl -H "Content-Type: application/json" -X POST http://localhost:5984/dbname/_compact/designname
  #=> {"ok":true}
  }}}
- 
  This compacts the view index from the current version of the design document. The HTTP response
code is 202 Accepted (like compaction for databases) and a compaction background task will
be created. Information on running compactions can be fetched with [[HTTP_view_API#Getting_Information_about_Design_Documents_.28and_their_Views.29|HTTP_view_API#Getting_Information_about_Design_Documents_(and_their_Views)]].
  
  View indexes on disk are named after their MD5 hash of the view definition. When you change
a view, old indexes remain on disk. To clean up all outdated view indexes (files named after
the MD5 representation of views, that does not exist anymore) you can trigger a view cleanup:
@@ -55, +49 @@

  curl -H "Content-Type: application/json" -X POST http://localhost:5984/dbname/_view_cleanup
  #=> {"ok":true}
  }}}
+ == Automatic Compaction ==
+ Since CouchDB 1.2 it is possible to configure automatic compaction, so that compaction of
databases and views is automatically triggered based on various criteria.  Automatic compaction
is configured in CouchDB's configuration files. The compaction daemon is responsible for triggering
the compaction. It is automatically started, but disabled by default
  
+ {{{
+ [daemons]
+ #...
+ compaction_daemon={couch_compaction_daemon, start_link, []}
+ }}}
+ {{{
+ [compaction_daemon]
+ ; The delay, in seconds, between each check for which database and view indexes
+ ; need to be compacted.
+ check_interval = 300
+ ; If a database or view index file is smaller then this value (in bytes),
+ ; compaction will not happen. Very small files always have a very high
+ ; fragmentation therefore it's not worth to compact them.
+ min_file_size = 131072
+ }}}
+ The criteria for triggering the compactions is configured in the "compactions" section.
+ 
+ {{{
+ [compactions]
+ ; List of compaction rules for the compaction daemon.
+ ; The daemon compacts databases and their respective view groups when all the
+ ; condition parameters are satisfied. Configuration can be per database or
+ ; global, and it has the following format:
+ ;
+ ; database_name = [ {ParamName, ParamValue}, {ParamName, ParamValue}, ... ]
+ ; _default = [ {ParamName, ParamValue}, {ParamName, ParamValue}, ... ]
+ }}}
+ === Possible Parameters ===
+  * '''db_fragmentation''': If the ratio (as an integer percentage), of the amount  of old
data (and its supporting metadata) over the database file size is equal to or greater then
this value, this database compaction condition is satisfied. This value is computed as<<BR>>(file_size
- data_size) / file_size * 100<<BR>>The data_size and file_size values can be
obtained when querying a database's information URI (GET /dbname/).
+  * '''view_fragmentation''': If the ratio (as an integer percentage), of the amount of old
data (and its supporting metadata) over the view index (view group) file size is equal to
or greater then this value, then this view index compaction condition is satisfied. This value
is computed as:<<BR>>(file_size - data_size) / file_size * 100<<BR>>The
data_size and file_size values can be obtained when querying a view group's information URI
(GET /dbname/_design/groupname/_info).
+  * '''from''' _and_ '''to: '''The period for which a database (and its view groups) compaction
is allowed. The value for these parameters must obey the format: HH:MM - HH:MM  (HH in [0..23],
MM in [0..59])
+  * '''strict_window: '''If a compaction is still running after the end of the allowed period,
it will be canceled if this parameter is set to 'true'. It defaults to 'false' and it's meaningful
only if the *period* parameter is also specified.
+  * '''parallel_view_compaction''': If set to 'true', the database and its views are compacted
in parallel. This is only useful on certain setups, like for example when the database and
view index directories point to different disks. It defaults to 'false'.
+ 
+ Before a compaction is triggered, an estimation of how much free disk space is needed is
computed. This estimation corresponds to 2 times the data size of the database or view index.
When there's not enough free disk space to compact a particular database or view index, a
warning message is logged.
+ 
+ === Examples ===
+  1. [{db_fragmentation, "70%"}, {view_fragmentation, "60%"}]<<BR>>The `foo`
database is compacted if its fragmentation is 70% or more. Any view index of this database
is compacted only if its fragmentation is 60% or more.
+  1. [{db_fragmentation, "70%"}, {view_fragmentation, "60%"}, {from, "00:00"}, {to, "04:00"}]<<BR>>Similar
to the preceding example but a compaction (database or view index) is only triggered if the
current time is between midnight and 4 AM.
+  1. [{db_fragmentation, "70%"}, {view_fragmentation, "60%"}, {from, "00:00"}, {to, "04:00"},
{strict_window, true}]<<BR>>Similar to the preceding example - a compaction (database
or view index) is only triggered if the current time is between midnight and 4 AM. If at 4
AM the database or one of its views is still compacting, the compaction process will be canceled.
+  1. [{db_fragmentation, "70%"}, {view_fragmentation, "60%"}, {from, "00:00"}, {to, "04:00"},
{strict_window, true}, {parallel_view_compaction, true}]<<BR>>Similar to the preceding
example, but a database and its views can be compacted in parallel.
+ 
+ === Default Configuration ===
+ The default configuration - if enabled - applies to all databases. For example
+ 
+ {{{
+ _default = [{db_fragmentation, "70%"}, {view_fragmentation, "60%"}, {from, "23:00"}, {to,
"04:00"}]
+ }}}
+ 

Mime
View raw message