couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Filipe Manana (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
Date Tue, 16 Aug 2011 03:04:27 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085507#comment-13085507
] 

Filipe Manana commented on COUCHDB-1153:
----------------------------------------

Thanks Paul

Not sure about what you mean with the loop weirdness. Doesn't seem complicated to me:   loop()
-> do_stuff(), sleep(...), loop().

An alternative ti start os_mon (i really don't care) is to add it to list it as a dependency
in the .app file.

You're right about the couch_server. It's part of the reason why the autocompaction is disabled
by default. Haven't seen however yet a big issue with about ~1000 databases. An approach would
be to wait a bit before opening a db if it's not in the lru cache perhahps.

Certainly there's a lot of room for improvements in auto compaction and an initial implementation
will unlikely ever be perfect for all scenarios.



> Database and view index compaction daemon
> -----------------------------------------
>
>                 Key: COUCHDB-1153
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
>             Project: CouchDB
>          Issue Type: New Feature
>         Environment: trunk
>            Reporter: Filipe Manana
>            Assignee: Filipe Manana
>            Priority: Minor
>              Labels: compaction
>
> I've recently written an Erlang process to automatically compact databases and they're
views based on some configurable parameters. These parameters can be global or per database
and are: minimum database fragmentation, minimum view fragmentation, allowed period and "strict_window"
(whether an ongoing compaction should be canceled if it doesn't finish within the allowed
period). These fragmentation values are based on the recently added "data_size" parameter
to the database and view group information URIs (COUCHDB-1132).
> I've documented the .ini configuration, as a comment in default.ini, which I paste here:
> [compaction_daemon]
> ; The delay, in seconds, between each check for which database and view indexes
> ; need to be compacted.
> check_interval = 60
> ; If a database or view index file is smaller then this value (in bytes),
> ; compaction will not happen. Very small files always have a very high
> ; fragmentation therefore it's not worth to compact them.
> min_file_size = 131072
> [compactions]
> ; List of compaction rules for the compaction daemon.
> ; The daemon compacts databases and they're respective view groups when all the
> ; condition parameters are satisfied. Configuration can be per database or
> ; global, and it has the following format:
> ;
> ; database_name = parameter=value [, parameter=value]*
> ; _default = parameter=value [, parameter=value]*
> ;
> ; Possible parameters:
> ;
> ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
> ;                      of old data (and its supporting metadata) over the database
> ;                      file size is equal to or greater then this value, this
> ;                      database compaction condition is satisfied.
> ;                      This value is computed as:
> ;
> ;                           (file_size - data_size) / file_size * 100
> ;
> ;                      The data_size and file_size values can be obtained when
> ;                      querying a database's information URI (GET /dbname/).
> ;
> ; * view_fragmentation - If the ratio (as an integer percentage), of the amount
> ;                        of old data (and its supporting metadata) over the view
> ;                        index (view group) file size is equal to or greater then
> ;                        this value, then this view index compaction condition is
> ;                        satisfied. This value is computed as:
> ;
> ;                            (file_size - data_size) / file_size * 100
> ;
> ;                        The data_size and file_size values can be obtained when
> ;                        querying a view group's information URI
> ;                        (GET /dbname/_design/groupname/_info).
> ;
> ; * period - The period for which a database (and its view groups) compaction
> ;            is allowed. This value must obey the following format:
> ;
> ;                HH:MM - HH:MM  (HH in [0..23], MM in [0..59])
> ;
> ; * strict_window - If a compaction is still running after the end of the allowed
> ;                   period, it will be canceled if this parameter is set to "yes".
> ;                   It defaults to "no" and it's meaningful only if the *period*
> ;                   parameter is also specified.
> ;
> ; * parallel_view_compaction - If set to "yes", the database and its views are
> ;                              compacted in parallel. This is only useful on
> ;                              certain setups, like for example when the database
> ;                              and view index directories point to different
> ;                              disks. It defaults to "no".
> ;
> ; Before a compaction is triggered, an estimation of how much free disk space is
> ; needed is computed. This estimation corresponds to 2 times the data size of
> ; the database or view index. When there's not enough free disk space to compact
> ; a particular database or view index, a warning message is logged.
> ;
> ; Examples:
> ;
> ; 1) foo = db_fragmentation = 70%, view_fragmentation = 60%
> ;    The `foo` database is compacted if its fragmentation is 70% or more.
> ;    Any view index of this database is compacted only if its fragmentation
> ;    is 60% or more.
> ;
> ; 2) foo = db_fragmentation = 70%, view_fragmentation = 60%, period = 00:00-04:00
> ;    Similar to the preceding example but a compaction (database or view index)
> ;    is only triggered if the current time is between midnight and 4 AM.
> ;
> ; 3) foo = db_fragmentation = 70%, view_fragmentation = 60%, period = 00:00-04:00, strict_window
= yes
> ;    Similar to the preceding example - a compaction (database or view index)
> ;    is only triggered if the current time is between midnight and 4 AM. If at
> ;    4 AM the database or one of its views is still compacting, the compaction
> ;    process will be canceled.
> ;
> ;_default = db_fragmentation = 70%, view_fragmentation = 60%, period = 23:00 - 04:00
> (from https://github.com/fdmanana/couchdb/compare/compaction_daemon#L0R195)
> The full patch is mostly a new module but also does some minimal changes and a small
refactoring to the view compaction code, not changing the current behaviour.
> Patch is at:
> https://github.com/fdmanana/couchdb/compare/compaction_daemon.patch
> By default the daemon is idle, without any configuration enabled. I'm open to suggestions
on additional parameters and a better configuration system.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message