[ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088331#comment-13088331 ] Robert Newson commented on COUCHDB-1153: ---------------------------------------- I'm concerned that this landed on trunk without a follow-up review once you'd addressed Paul' concerns. Since we will all share the burden of maintenance once this is included in a release, a little more effort to gain consensus would have been appreciated. > Database and view index compaction daemon > ----------------------------------------- > > Key: COUCHDB-1153 > URL: https://issues.apache.org/jira/browse/COUCHDB-1153 > Project: CouchDB > Issue Type: New Feature > Environment: trunk > Reporter: Filipe Manana > Assignee: Filipe Manana > Priority: Minor > Labels: compaction > > I've recently written an Erlang process to automatically compact databases and they're views based on some configurable parameters. These parameters can be global or per database and are: minimum database fragmentation, minimum view fragmentation, allowed period and "strict_window" (whether an ongoing compaction should be canceled if it doesn't finish within the allowed period). These fragmentation values are based on the recently added "data_size" parameter to the database and view group information URIs (COUCHDB-1132). > I've documented the .ini configuration, as a comment in default.ini, which I paste here: > [compaction_daemon] > ; The delay, in seconds, between each check for which database and view indexes > ; need to be compacted. > check_interval = 60 > ; If a database or view index file is smaller then this value (in bytes), > ; compaction will not happen. Very small files always have a very high > ; fragmentation therefore it's not worth to compact them. > min_file_size = 131072 > [compactions] > ; List of compaction rules for the compaction daemon. > ; The daemon compacts databases and they're respective view groups when all the > ; condition parameters are satisfied. Configuration can be per database or > ; global, and it has the following format: > ; > ; database_name = parameter=value [, parameter=value]* > ; _default = parameter=value [, parameter=value]* > ; > ; Possible parameters: > ; > ; * db_fragmentation - If the ratio (as an integer percentage), of the amount > ; of old data (and its supporting metadata) over the database > ; file size is equal to or greater then this value, this > ; database compaction condition is satisfied. > ; This value is computed as: > ; > ; (file_size - data_size) / file_size * 100 > ; > ; The data_size and file_size values can be obtained when > ; querying a database's information URI (GET /dbname/). > ; > ; * view_fragmentation - If the ratio (as an integer percentage), of the amount > ; of old data (and its supporting metadata) over the view > ; index (view group) file size is equal to or greater then > ; this value, then this view index compaction condition is > ; satisfied. This value is computed as: > ; > ; (file_size - data_size) / file_size * 100 > ; > ; The data_size and file_size values can be obtained when > ; querying a view group's information URI > ; (GET /dbname/_design/groupname/_info). > ; > ; * period - The period for which a database (and its view groups) compaction > ; is allowed. This value must obey the following format: > ; > ; HH:MM - HH:MM (HH in [0..23], MM in [0..59]) > ; > ; * strict_window - If a compaction is still running after the end of the allowed > ; period, it will be canceled if this parameter is set to "yes". > ; It defaults to "no" and it's meaningful only if the *period* > ; parameter is also specified. > ; > ; * parallel_view_compaction - If set to "yes", the database and its views are > ; compacted in parallel. This is only useful on > ; certain setups, like for example when the database > ; and view index directories point to different > ; disks. It defaults to "no". > ; > ; Before a compaction is triggered, an estimation of how much free disk space is > ; needed is computed. This estimation corresponds to 2 times the data size of > ; the database or view index. When there's not enough free disk space to compact > ; a particular database or view index, a warning message is logged. > ; > ; Examples: > ; > ; 1) foo = db_fragmentation = 70%, view_fragmentation = 60% > ; The `foo` database is compacted if its fragmentation is 70% or more. > ; Any view index of this database is compacted only if its fragmentation > ; is 60% or more. > ; > ; 2) foo = db_fragmentation = 70%, view_fragmentation = 60%, period = 00:00-04:00 > ; Similar to the preceding example but a compaction (database or view index) > ; is only triggered if the current time is between midnight and 4 AM. > ; > ; 3) foo = db_fragmentation = 70%, view_fragmentation = 60%, period = 00:00-04:00, strict_window = yes > ; Similar to the preceding example - a compaction (database or view index) > ; is only triggered if the current time is between midnight and 4 AM. If at > ; 4 AM the database or one of its views is still compacting, the compaction > ; process will be canceled. > ; > ;_default = db_fragmentation = 70%, view_fragmentation = 60%, period = 23:00 - 04:00 > (from https://github.com/fdmanana/couchdb/compare/compaction_daemon#L0R195) > The full patch is mostly a new module but also does some minimal changes and a small refactoring to the view compaction code, not changing the current behaviour. > Patch is at: > https://github.com/fdmanana/couchdb/compare/compaction_daemon.patch > By default the daemon is idle, without any configuration enabled. I'm open to suggestions on additional parameters and a better configuration system. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira