couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jan Lehnardt (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COUCHDB-1132) Track used space of database and view index files
Date Wed, 20 Apr 2011 12:53:05 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022099#comment-13022099
] 

Jan Lehnardt commented on COUCHDB-1132:
---------------------------------------

I'm all for making the compactor smarter :)

Great work Filipe!

I wish we could accurately make this equation work file_size - data_size = post_compaction_file_size,
but it seems overly complicated to try, it would "just" be a nice API behaviour, that isn't
required for any of this. So yeah.

> Track used space of database and view index files
> -------------------------------------------------
>
>                 Key: COUCHDB-1132
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1132
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core
>            Reporter: Filipe Manana
>             Fix For: 1.2
>
>
> Currently users have no reliable way to know if a database or view index compaction is
needed.
> Both me, Adam and Robert Dionne have been working on a feature to compute and expose
the current data size (in bytes) of databases and view indexes. These computations are exposed
as a single field in the database info and view index info URIs.
> Comparing this new value with the disk_size value (the total space in bytes used by the
database or view index file) would allow users to decide whether or not it's worth to trigger
a compaction.
> Adam and Robert's work can be found at:
> https://github.com/cloudant/bigcouch/compare/7d1adfa...a9410e6
> Mine can be found at:
> https://github.com/fdmanana/couchdb/compare/file_space
> After chatting with Adam on IRC, the main difference seems to be that they're work accounts
only for user data (document bodies + attachments), while mine also accounts for the btree
values (including all meta information, keys, rev trees, etc) and the data added by couch_file
(4 bytes length prefix, md5s, block boundary markers).
> An example:
> $ curl http://localhost:5984/btree_db/_design/test/_info
> {"name":"test","view_index":{"signature":"aba9f066ed7f042f63d245ce0c7d870e","language":"javascript","disk_size":274556,"data_size":270455,"updater_running":false,"compact_running":false,"waiting_commit":false,"waiting_clients":0,"update_seq":1004,"purge_seq":0}}
> $ curl http://localhost:5984/btree_db
> {"db_name":"btree_db","doc_count":1004,"doc_del_count":0,"update_seq":1004,"purge_seq":0,"compact_running":false,"disk_size":6197361,"data_size":6186460,"instance_start_time":"1303231080936421","disk_format_version":5,"committed_update_seq":1004}
> This example was executed just after compacting the test database and view index. The
new filed "data_size" has a value very close to the final file size.
> The only thing that my branch doesn't include in the data_size computation, for databases,
are the size of the last header, the size of the _security object and purged revs list - in
practice these are very small and insignificant that adding extra code to account them doesn't
seem worth it.
> I'm sure we can merge the best from both branches.
> Adam, Robert, thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message