Return-Path: X-Original-To: apmail-couchdb-dev-archive@www.apache.org Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F3AA190B0 for ; Wed, 16 May 2012 14:49:38 +0000 (UTC) Received: (qmail 33934 invoked by uid 500); 16 May 2012 14:49:38 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 33895 invoked by uid 500); 16 May 2012 14:49:38 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 33886 invoked by uid 99); 16 May 2012 14:49:38 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 May 2012 14:49:38 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of north.n@gmail.com designates 209.85.213.180 as permitted sender) Received: from [209.85.213.180] (HELO mail-yx0-f180.google.com) (209.85.213.180) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 May 2012 14:49:30 +0000 Received: by yenq6 with SMTP id q6so895826yen.11 for ; Wed, 16 May 2012 07:49:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=uLh+CRM8wFX9zUwP1MdL7Oc3j4Ow4oEPDWnwEcLMDo0=; b=ReFtXeU4GsQD3xGLIOOXkYeJOwJcd6G49kiaQJZHkrPrXzPJnF9xyEbU8av2WGwiNc YhG5YkXb2OCIh5kE4uK7+3n76azc+9c+HCyegmGurjnQl0PluCK/8eMleS3truHUzlYO g/hsX1nb7jmO3ljCg/MCJKXi1fSxXvZGv4qPAgsAeVPVKAImbUt8I5cNNIyP60WQNTeN +thXoxkBX++oeShiVu3L7bcV+D7WkwRURBhOl3cvywJjou1O6SWBSJXoDByJamf7EWrU ZLIy6j8jpIQvO6KKG0WEnaxgeucrGBdDT98PRw8DUYujhqDHTdZmyOBRrQzgtAaj/8La Vfog== MIME-Version: 1.0 Received: by 10.236.153.104 with SMTP id e68mr3732419yhk.36.1337179749421; Wed, 16 May 2012 07:49:09 -0700 (PDT) Received: by 10.236.176.225 with HTTP; Wed, 16 May 2012 07:49:09 -0700 (PDT) In-Reply-To: References: Date: Wed, 16 May 2012 15:49:09 +0100 Message-ID: Subject: Re: Replicated database size From: Nick North To: dev@couchdb.apache.org Content-Type: multipart/alternative; boundary=20cf302efd5ef7111f04c0286b4f --20cf302efd5ef7111f04c0286b4f Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Following up on my own email, this seems to be an issue with snappy on Windows Server 2008. When I changed the file_compression setting to deflate_6, the "large" databases went down from 7GB to 1GB after compaction. I'm not entirely sure if this counts as a bug so I won't raise an issue on it. By the way: kudos to whoever wrote the code to deal with file_compression. When I changed file_compression to deflate_6, the system happily worked with the existing, supposedly snappy-compressed databases, and converted format on the next compaction. That could have gone wrong in several ways, but didn't, so thank you. Nick On 15 May 2012 13:55, Nick North wrote: > I'm curious about the size of replicated CouchDb databases in comparison > to each other. I have four databases, each with pull replications from th= e > other three, but they report quite different data sizes. Two of them say: > > {"db_name":"hydra","doc_count":1489060,"doc_del_count":2754893,"update_se= q":6998882,"purge_seq":0,"compact_running":false,"disk_size":3213656193,"da= ta_size":1395943755,"instance_start_time":"1337067567481841","disk_format_v= ersion":6,"committed_update_seq":6998882} > > While the other two say this - note the difference in data_size: > > {"db_name":"hydra","doc_count":1489441,"doc_del_count":2755302,"update_se= q":4375865,"purge_seq":0,"compact_running":false,"disk_size":7599413027,"da= ta_size":7265993199,"instance_start_time":"1337014746154865","disk_format_v= ersion":6,"committed_update_seq":4375865} > > (There is some discrepancy in the doc_count because new documents are bei= ng posted continuously, and some went in in between fetching stats for the = various instances.) Other possibly relevant information: > > > - All the replications appear to be in working order so I don't believ= e there is a backlog of documents waiting to be replicated. > - The database has just one design view and whether or not it has been= queried does not seem to make any difference to whether the database is "l= arge" or "small". > - Compaction makes little difference, in that the "large" instances al= ways remain much larger than the "small" ones. > - Everything is running CouchDb 1.2 on Windows: the "small" instances = on Windows 7 and Windows Vista, and the "large" ones on Windows Server 2008= . > - File_compression is set to "snappy" in all cases and there are no at= tachments anywhere. > > Can anyone suggest what might be going on here? My best guess is that it'= s to do with file compression on Windows Server but that is a guess, so I'm= intending to do some experimentation with the other file compression optio= ns. I'd be grateful for any thoughts, as I'm planning out disk requirements= for a system with ten times the capacity of the current one, and would ver= y much like to be do that with some certainty about file sizes. Thanks in a= dvance for any help, > > Nick North > > > --20cf302efd5ef7111f04c0286b4f--