Return-Path: X-Original-To: apmail-couchdb-dev-archive@www.apache.org Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9F8AD1F8F for ; Wed, 20 Apr 2011 12:53:47 +0000 (UTC) Received: (qmail 32282 invoked by uid 500); 20 Apr 2011 12:53:47 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 32253 invoked by uid 500); 20 Apr 2011 12:53:47 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 32245 invoked by uid 99); 20 Apr 2011 12:53:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Apr 2011 12:53:47 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Apr 2011 12:53:44 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id B4456AA9CB for ; Wed, 20 Apr 2011 12:53:05 +0000 (UTC) Date: Wed, 20 Apr 2011 12:53:05 +0000 (UTC) From: "Jan Lehnardt (JIRA)" To: dev@couchdb.apache.org Message-ID: <304889548.69817.1303303985735.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1508963173.67838.1303238106299.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (COUCHDB-1132) Track used space of database and view index files MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/COUCHDB-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022099#comment-13022099 ] Jan Lehnardt commented on COUCHDB-1132: --------------------------------------- I'm all for making the compactor smarter :) Great work Filipe! I wish we could accurately make this equation work file_size - data_size = post_compaction_file_size, but it seems overly complicated to try, it would "just" be a nice API behaviour, that isn't required for any of this. So yeah. > Track used space of database and view index files > ------------------------------------------------- > > Key: COUCHDB-1132 > URL: https://issues.apache.org/jira/browse/COUCHDB-1132 > Project: CouchDB > Issue Type: New Feature > Components: Database Core > Reporter: Filipe Manana > Fix For: 1.2 > > > Currently users have no reliable way to know if a database or view index compaction is needed. > Both me, Adam and Robert Dionne have been working on a feature to compute and expose the current data size (in bytes) of databases and view indexes. These computations are exposed as a single field in the database info and view index info URIs. > Comparing this new value with the disk_size value (the total space in bytes used by the database or view index file) would allow users to decide whether or not it's worth to trigger a compaction. > Adam and Robert's work can be found at: > https://github.com/cloudant/bigcouch/compare/7d1adfa...a9410e6 > Mine can be found at: > https://github.com/fdmanana/couchdb/compare/file_space > After chatting with Adam on IRC, the main difference seems to be that they're work accounts only for user data (document bodies + attachments), while mine also accounts for the btree values (including all meta information, keys, rev trees, etc) and the data added by couch_file (4 bytes length prefix, md5s, block boundary markers). > An example: > $ curl http://localhost:5984/btree_db/_design/test/_info > {"name":"test","view_index":{"signature":"aba9f066ed7f042f63d245ce0c7d870e","language":"javascript","disk_size":274556,"data_size":270455,"updater_running":false,"compact_running":false,"waiting_commit":false,"waiting_clients":0,"update_seq":1004,"purge_seq":0}} > $ curl http://localhost:5984/btree_db > {"db_name":"btree_db","doc_count":1004,"doc_del_count":0,"update_seq":1004,"purge_seq":0,"compact_running":false,"disk_size":6197361,"data_size":6186460,"instance_start_time":"1303231080936421","disk_format_version":5,"committed_update_seq":1004} > This example was executed just after compacting the test database and view index. The new filed "data_size" has a value very close to the final file size. > The only thing that my branch doesn't include in the data_size computation, for databases, are the size of the last header, the size of the _security object and purged revs list - in practice these are very small and insignificant that adding extra code to account them doesn't seem worth it. > I'm sure we can merge the best from both branches. > Adam, Robert, thoughts? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira