couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Joseph Davis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COUCHDB-1153) Database and view index compaction daemon
Date Mon, 22 Aug 2011 22:22:29 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089077#comment-13089077
] 

Paul Joseph Davis commented on COUCHDB-1153:
--------------------------------------------

@Filipe

Re you're earlier comment: "It behaves fairly well, specially for the case where the number
of databases is <= max_dbs_open."

Yes. That's the be expected. Re-reading my earlier comment I wasn't as clear as I could have
been.

The issue here is that couch_server's LRU cache can easily turn into a table scan for every
incoming open/create message that it receives. There are a couple of conditions that you need
to satisfy for it to become noticeable, but when it happens it turns into a positive feedback
loop that grinds couch_server to a halt and eventually crashes the VM due to running out of
memory because it can't process its mailbox fast enough.

First condition is that you have a large amount of active databases that is near the max_dbs_open
limit. The reason that this is important is that you need a large number of databases for
which couch_db:is_idle/1 returns false. The way that couch_server's LRU works is by checking
the oldest used DB, and if its idle it scans for the next one. If there are lots of active
db's and a largish max_dbs_open setting, this can turn into a largish loop as it scans through
ETS looking for an idle db.

I haven't tried triggering this on purpose yet, but if I were going to I'd start by setting
the max_dbs_open to something like 1000, open 990 or so clients that are all listening to
continuous changes to make sure that is_idle returns false for most db's. The test then would
be to run a load test under this condition while the auto-compactor loops through all dbs.
This would be especially painful where max_dbs_open covers say a 20% list of hot databases
with some breathing room for the less often used databases.

Obviously the "correct" solution here is to fix couch_server to not suck, but a proper fix
there is going to take some serious engineering and will require modifying some critical pieces
of code. The worry with the auto-compactor is that its going to make hitting this error condition
more likely as it churns through a possibly large number of databases eating up open db slots
in couch_server's ets tables. Then again it may be fine, but it doesn't sound like anyone's
addressed it.



Turning our attention to the patch itself, you've addressed most of what I commented on before
but there are still things that I'd like to see changed that don't relate to the performance
questions:

* The issue with adding value formats like "60%" is that you have to spend time writing and
maintaing code that's essentially useless. There's nothing that  a % sign indicates that a
simple comment wouldn't handle. And yet the parsing itself is prone to barfing on users if
they happen to make a small typo or similar error. Specific error conditions that are apparent
just reading the code " 60%" "60% ", "60%5" "60%%" etc etc. There's also no default clause
when setting record members which will puke on users as well.

* There are some record tricks you can use to remove some of the redundancy in this config
code. Also, I've found it more sane to have two passes for this sort of thing. The first pass
sets the values and the second pass enforces constraints. This makes things like the handling
for #period much nicer.

* There's still not a timeout on that receive clause for the parallel view builds. I understand
there's a receive clause further down and "it could never happen" that we get stuck there.
But we could. Cause there's no timeout.

* The order of function definitions in this module is giving me the rage eyes. But maybe I'm
the only crotchety bastard that grumbles to himself about such things.


> Database and view index compaction daemon
> -----------------------------------------
>
>                 Key: COUCHDB-1153
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1153
>             Project: CouchDB
>          Issue Type: New Feature
>         Environment: trunk
>            Reporter: Filipe Manana
>            Assignee: Filipe Manana
>            Priority: Minor
>              Labels: compaction
>
> I've recently written an Erlang process to automatically compact databases and they're
views based on some configurable parameters. These parameters can be global or per database
and are: minimum database fragmentation, minimum view fragmentation, allowed period and "strict_window"
(whether an ongoing compaction should be canceled if it doesn't finish within the allowed
period). These fragmentation values are based on the recently added "data_size" parameter
to the database and view group information URIs (COUCHDB-1132).
> I've documented the .ini configuration, as a comment in default.ini, which I paste here:
> [compaction_daemon]
> ; The delay, in seconds, between each check for which database and view indexes
> ; need to be compacted.
> check_interval = 60
> ; If a database or view index file is smaller then this value (in bytes),
> ; compaction will not happen. Very small files always have a very high
> ; fragmentation therefore it's not worth to compact them.
> min_file_size = 131072
> [compactions]
> ; List of compaction rules for the compaction daemon.
> ; The daemon compacts databases and they're respective view groups when all the
> ; condition parameters are satisfied. Configuration can be per database or
> ; global, and it has the following format:
> ;
> ; database_name = parameter=value [, parameter=value]*
> ; _default = parameter=value [, parameter=value]*
> ;
> ; Possible parameters:
> ;
> ; * db_fragmentation - If the ratio (as an integer percentage), of the amount
> ;                      of old data (and its supporting metadata) over the database
> ;                      file size is equal to or greater then this value, this
> ;                      database compaction condition is satisfied.
> ;                      This value is computed as:
> ;
> ;                           (file_size - data_size) / file_size * 100
> ;
> ;                      The data_size and file_size values can be obtained when
> ;                      querying a database's information URI (GET /dbname/).
> ;
> ; * view_fragmentation - If the ratio (as an integer percentage), of the amount
> ;                        of old data (and its supporting metadata) over the view
> ;                        index (view group) file size is equal to or greater then
> ;                        this value, then this view index compaction condition is
> ;                        satisfied. This value is computed as:
> ;
> ;                            (file_size - data_size) / file_size * 100
> ;
> ;                        The data_size and file_size values can be obtained when
> ;                        querying a view group's information URI
> ;                        (GET /dbname/_design/groupname/_info).
> ;
> ; * period - The period for which a database (and its view groups) compaction
> ;            is allowed. This value must obey the following format:
> ;
> ;                HH:MM - HH:MM  (HH in [0..23], MM in [0..59])
> ;
> ; * strict_window - If a compaction is still running after the end of the allowed
> ;                   period, it will be canceled if this parameter is set to "yes".
> ;                   It defaults to "no" and it's meaningful only if the *period*
> ;                   parameter is also specified.
> ;
> ; * parallel_view_compaction - If set to "yes", the database and its views are
> ;                              compacted in parallel. This is only useful on
> ;                              certain setups, like for example when the database
> ;                              and view index directories point to different
> ;                              disks. It defaults to "no".
> ;
> ; Before a compaction is triggered, an estimation of how much free disk space is
> ; needed is computed. This estimation corresponds to 2 times the data size of
> ; the database or view index. When there's not enough free disk space to compact
> ; a particular database or view index, a warning message is logged.
> ;
> ; Examples:
> ;
> ; 1) foo = db_fragmentation = 70%, view_fragmentation = 60%
> ;    The `foo` database is compacted if its fragmentation is 70% or more.
> ;    Any view index of this database is compacted only if its fragmentation
> ;    is 60% or more.
> ;
> ; 2) foo = db_fragmentation = 70%, view_fragmentation = 60%, period = 00:00-04:00
> ;    Similar to the preceding example but a compaction (database or view index)
> ;    is only triggered if the current time is between midnight and 4 AM.
> ;
> ; 3) foo = db_fragmentation = 70%, view_fragmentation = 60%, period = 00:00-04:00, strict_window
= yes
> ;    Similar to the preceding example - a compaction (database or view index)
> ;    is only triggered if the current time is between midnight and 4 AM. If at
> ;    4 AM the database or one of its views is still compacting, the compaction
> ;    process will be canceled.
> ;
> ;_default = db_fragmentation = 70%, view_fragmentation = 60%, period = 23:00 - 04:00
> (from https://github.com/fdmanana/couchdb/compare/compaction_daemon#L0R195)
> The full patch is mostly a new module but also does some minimal changes and a small
refactoring to the view compaction code, not changing the current behaviour.
> Patch is at:
> https://github.com/fdmanana/couchdb/compare/compaction_daemon.patch
> By default the daemon is idle, without any configuration enabled. I'm open to suggestions
on additional parameters and a better configuration system.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message