couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Kocoloski (JIRA)" <>
Subject [jira] [Commented] (COUCHDB-1132) Track used space of database and view index files
Date Wed, 20 Apr 2011 14:09:05 GMT


Adam Kocoloski commented on COUCHDB-1132:

@janl Did you mean data_size = post_compaction_file_size?  What you wrote doesn't make sense
to me.  And yes, I think it would be too complicated to try to do that.

@fdmanana The view compactor uses a static batch size of 10000.  The work queues are only
involved during indexing.  I put a patch somewhere to place a configurable minimum bound on
the size of the batch written to disk during indexing, which does help reduce the file size.

Regarding the config entry, I've started to think that every new config entry we add represents
a problem we couldn't solve for the end user.  If we need to have an entry, maybe we should
use units that make more sense for the user, e.g. a threshold in bytes for the compactor process
above which it flushes to disk.  I'd be particularly in favor of such a threshold for the
view compactor, since the the map values are loaded into memory simultaneously (as opposed
to the document bodies, which are written to the new file one at a time regardless of batch
size).  Different view compactions can use wildly different amounts of memory depending on
the average value size.

> Track used space of database and view index files
> -------------------------------------------------
>                 Key: COUCHDB-1132
>                 URL:
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core
>            Reporter: Filipe Manana
>             Fix For: 1.2
> Currently users have no reliable way to know if a database or view index compaction is
> Both me, Adam and Robert Dionne have been working on a feature to compute and expose
the current data size (in bytes) of databases and view indexes. These computations are exposed
as a single field in the database info and view index info URIs.
> Comparing this new value with the disk_size value (the total space in bytes used by the
database or view index file) would allow users to decide whether or not it's worth to trigger
a compaction.
> Adam and Robert's work can be found at:
> Mine can be found at:
> After chatting with Adam on IRC, the main difference seems to be that they're work accounts
only for user data (document bodies + attachments), while mine also accounts for the btree
values (including all meta information, keys, rev trees, etc) and the data added by couch_file
(4 bytes length prefix, md5s, block boundary markers).
> An example:
> $ curl http://localhost:5984/btree_db/_design/test/_info
> {"name":"test","view_index":{"signature":"aba9f066ed7f042f63d245ce0c7d870e","language":"javascript","disk_size":274556,"data_size":270455,"updater_running":false,"compact_running":false,"waiting_commit":false,"waiting_clients":0,"update_seq":1004,"purge_seq":0}}
> $ curl http://localhost:5984/btree_db
> {"db_name":"btree_db","doc_count":1004,"doc_del_count":0,"update_seq":1004,"purge_seq":0,"compact_running":false,"disk_size":6197361,"data_size":6186460,"instance_start_time":"1303231080936421","disk_format_version":5,"committed_update_seq":1004}
> This example was executed just after compacting the test database and view index. The
new filed "data_size" has a value very close to the final file size.
> The only thing that my branch doesn't include in the data_size computation, for databases,
are the size of the last header, the size of the _security object and purged revs list - in
practice these are very small and insignificant that adding extra code to account them doesn't
seem worth it.
> I'm sure we can merge the best from both branches.
> Adam, Robert, thoughts?

This message is automatically generated by JIRA.
For more information on JIRA, see:

View raw message