Return-Path: X-Original-To: apmail-couchdb-dev-archive@www.apache.org Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4927D16C9 for ; Tue, 19 Apr 2011 18:37:48 +0000 (UTC) Received: (qmail 72650 invoked by uid 500); 19 Apr 2011 18:37:47 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 72613 invoked by uid 500); 19 Apr 2011 18:37:47 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 72599 invoked by uid 99); 19 Apr 2011 18:37:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Apr 2011 18:37:47 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Apr 2011 18:37:44 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id D0F3BA9712 for ; Tue, 19 Apr 2011 18:37:05 +0000 (UTC) Date: Tue, 19 Apr 2011 18:37:05 +0000 (UTC) From: "Filipe Manana (JIRA)" To: dev@couchdb.apache.org Message-ID: <2010683549.67843.1303238225852.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1508963173.67838.1303238106299.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (COUCHDB-1132) Track used space of database and view index files MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/COUCHDB-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Filipe Manana updated COUCHDB-1132: ----------------------------------- Description: Currently users have no reliable way to know if a database or view index compaction is needed. Both me, Adam and Robert Dionne have been working on a feature to compute and expose the current data size (in bytes) of databases and view indexes. These computations are exposed as a single field in the database info and view index info URIs. Comparing this new value with the disk_size value (the total space in bytes used by the database or view index file) would allow users to decide whether or not it's worth to trigger a compaction. Adam and Robert's work can be found at: https://github.com/cloudant/bigcouch/compare/7d1adfa...a9410e6 Mine can be found at: https://github.com/fdmanana/couchdb/compare/file_space After chatting with Adam on IRC, the main difference seems to be that they're work accounts only for user data (document bodies + attachments), while mine also accounts for the btree values (including all meta information, keys, rev trees, etc) and the data added by couch_file (4 bytes length prefix, md5s, block boundary markers). An example: $ curl http://localhost:5984/btree_db/_design/test/_info {"name":"test","view_index":{"signature":"aba9f066ed7f042f63d245ce0c7d870e","language":"javascript","disk_size":274556,"data_size":270455,"updater_running":false,"compact_running":false,"waiting_commit":false,"waiting_clients":0,"update_seq":1004,"purge_seq":0}} $ curl http://localhost:5984/btree_db {"db_name":"btree_db","doc_count":1004,"doc_del_count":0,"update_seq":1004,"purge_seq":0,"compact_running":false,"disk_size":6197361,"data_size":6186460,"instance_start_time":"1303231080936421","disk_format_version":5,"committed_update_seq":1004} This example was executed just after compacting the test database and view index. The new filed "data_size" has a value very close to the final file size. The only thing that my branch doesn't include in the data_size computation, for databases, are the size of the last header, the size of the _security object and purged revs list - in practice these are very small and insignificant that adding extra code to account them doesn't seem worth it. I'm sure we can merge the best from both branches. Adam, Robert, thoughts? was: Currently users have no reliable way to know if a database or view index compaction is needed. Both me, Adam and Robert Dionne have been working on a feature to compute and expose the current data size (in bytes) of databases and view indexes. These computations are exposed as a single field in the database info and view index info URIs. Comparing this new value with the disk_size value (the total space in bytes used by the database or view index file) would allow users to decide whether or not it's worth to trigger a compaction. Adam and Robert's work can be found at: https://github.com/cloudant/bigcouch/compare/7d1adfa...a9410e6 Mine can be found at: https://github.com/fdmanana/couchdb/compare/file_space After chatting with Adam on IRC, the main difference seems to be that they're work accounts only for user data (document bodies + attachments), while mine also accounts for the btree values (including all meta information, keys, rev trees, etc) and the data added by couch_file (4 bytes length prefix, md5s, block boundary markers). An example: $ curl http://localhost:5984/btree_db/_design/test/_info {"name":"test","view_index":{"signature":"aba9f066ed7f042f63d245ce0c7d870e","language":"javascript","disk_size":274556,"data_size":90742,"updater_running":false,"compact_running":false,"waiting_commit":false,"waiting_clients":0,"update_seq":1004,"purge_seq":0}} $ curl http://localhost:5984/btree_db {"db_name":"btree_db","doc_count":1004,"doc_del_count":0,"update_seq":1004,"purge_seq":0,"compact_running":false,"disk_size":6197361,"data_size":6186460,"instance_start_time":"1303231080936421","disk_format_version":5,"committed_update_seq":1004} This example was executed just after compacting the test database and view index. The new filed "data_size" has a value very close to the final file size. The only thing that my branch doesn't include in the data_size computation, for databases, are the size of the last header, the size of the _security object and purged revs list - in practice these are very small and insignificant that adding extra code to account them doesn't seem worth it. I'm sure we can merge the best from both branches. Adam, Robert, thoughts? > Track used space of database and view index files > ------------------------------------------------- > > Key: COUCHDB-1132 > URL: https://issues.apache.org/jira/browse/COUCHDB-1132 > Project: CouchDB > Issue Type: New Feature > Components: Database Core > Reporter: Filipe Manana > Fix For: 1.2 > > > Currently users have no reliable way to know if a database or view index compaction is needed. > Both me, Adam and Robert Dionne have been working on a feature to compute and expose the current data size (in bytes) of databases and view indexes. These computations are exposed as a single field in the database info and view index info URIs. > Comparing this new value with the disk_size value (the total space in bytes used by the database or view index file) would allow users to decide whether or not it's worth to trigger a compaction. > Adam and Robert's work can be found at: > https://github.com/cloudant/bigcouch/compare/7d1adfa...a9410e6 > Mine can be found at: > https://github.com/fdmanana/couchdb/compare/file_space > After chatting with Adam on IRC, the main difference seems to be that they're work accounts only for user data (document bodies + attachments), while mine also accounts for the btree values (including all meta information, keys, rev trees, etc) and the data added by couch_file (4 bytes length prefix, md5s, block boundary markers). > An example: > $ curl http://localhost:5984/btree_db/_design/test/_info > {"name":"test","view_index":{"signature":"aba9f066ed7f042f63d245ce0c7d870e","language":"javascript","disk_size":274556,"data_size":270455,"updater_running":false,"compact_running":false,"waiting_commit":false,"waiting_clients":0,"update_seq":1004,"purge_seq":0}} > $ curl http://localhost:5984/btree_db > {"db_name":"btree_db","doc_count":1004,"doc_del_count":0,"update_seq":1004,"purge_seq":0,"compact_running":false,"disk_size":6197361,"data_size":6186460,"instance_start_time":"1303231080936421","disk_format_version":5,"committed_update_seq":1004} > This example was executed just after compacting the test database and view index. The new filed "data_size" has a value very close to the final file size. > The only thing that my branch doesn't include in the data_size computation, for databases, are the size of the last header, the size of the _security object and purged revs list - in practice these are very small and insignificant that adding extra code to account them doesn't seem worth it. > I'm sure we can merge the best from both branches. > Adam, Robert, thoughts? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira