From dev-return-22245-apmail-couchdb-dev-archive=couchdb.apache.org@couchdb.apache.org Tue May 15 12:55:38 2012 Return-Path: X-Original-To: apmail-couchdb-dev-archive@www.apache.org Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2249BC1C3 for ; Tue, 15 May 2012 12:55:38 +0000 (UTC) Received: (qmail 95169 invoked by uid 500); 15 May 2012 12:55:37 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 95002 invoked by uid 500); 15 May 2012 12:55:36 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 94988 invoked by uid 99); 15 May 2012 12:55:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 May 2012 12:55:36 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of north.n@gmail.com designates 209.85.213.52 as permitted sender) Received: from [209.85.213.52] (HELO mail-yw0-f52.google.com) (209.85.213.52) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 May 2012 12:55:30 +0000 Received: by yhpp61 with SMTP id p61so6676872yhp.11 for ; Tue, 15 May 2012 05:55:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=zM6uTXz1QpmIt+hyK27ghkRuq1W1+M5InbPiU0LsRiw=; b=ATi3TIJwLhpstDyL63kTYlCYP+JqIDbbypRSCJCMmJLYi3vshB7KQbpfqOfnbnePwh bnD52Dz8N8kGSWE8zNvliksgKTfWX/TNc96JWxDbRKXzyKTt2BQjMIlVMK38+THoYYbn AeMnJCQOYKU1v+dH2CkfUduZ7CDSzTZPFn+MkACBhaDTI8oSy8Vt8LE+X+faqSXcJ4Ir gmeb3f3wZWBYj+02dhfpwetce25qs+hoVXOMUDqHn6W5AcRGg7ao9BmR57Y4mRPFL5dN uLT/IxZ2zho3LLyGWYiBbaML5AYapyky4oeLiU6qaVqmgTvpedXk+0LUHWDTjriDZRga v6Sg== MIME-Version: 1.0 Received: by 10.236.145.34 with SMTP id o22mr12350339yhj.7.1337086509649; Tue, 15 May 2012 05:55:09 -0700 (PDT) Received: by 10.236.176.225 with HTTP; Tue, 15 May 2012 05:55:09 -0700 (PDT) Date: Tue, 15 May 2012 13:55:09 +0100 Message-ID: Subject: Replicated database size From: Nick North To: dev@couchdb.apache.org Content-Type: multipart/alternative; boundary=20cf303b3cdf710ef604c012b640 X-Virus-Checked: Checked by ClamAV on apache.org --20cf303b3cdf710ef604c012b640 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I'm curious about the size of replicated CouchDb databases in comparison to each other. I have four databases, each with pull replications from the other three, but they report quite different data sizes. Two of them say: {"db_name":"hydra","doc_count":1489060,"doc_del_count":2754893,"update_seq"= :6998882,"purge_seq":0,"compact_running":false,"disk_size":3213656193,"data= _size":1395943755,"instance_start_time":"1337067567481841","disk_format_ver= sion":6,"committed_update_seq":6998882} While the other two say this - note the difference in data_size: {"db_name":"hydra","doc_count":1489441,"doc_del_count":2755302,"update_seq"= :4375865,"purge_seq":0,"compact_running":false,"disk_size":7599413027,"data= _size":7265993199,"instance_start_time":"1337014746154865","disk_format_ver= sion":6,"committed_update_seq":4375865} (There is some discrepancy in the doc_count because new documents are being posted continuously, and some went in in between fetching stats for the various instances.) Other possibly relevant information: - All the replications appear to be in working order so I don't believe there is a backlog of documents waiting to be replicated. - The database has just one design view and whether or not it has been queried does not seem to make any difference to whether the database is "large" or "small". - Compaction makes little difference, in that the "large" instances always remain much larger than the "small" ones. - Everything is running CouchDb 1.2 on Windows: the "small" instances on Windows 7 and Windows Vista, and the "large" ones on Windows Server 2008. - File_compression is set to "snappy" in all cases and there are no attachments anywhere. Can anyone suggest what might be going on here? My best guess is that it's to do with file compression on Windows Server but that is a guess, so I'm intending to do some experimentation with the other file compression options. I'd be grateful for any thoughts, as I'm planning out disk requirements for a system with ten times the capacity of the current one, and would very much like to be do that with some certainty about file sizes. Thanks in advance for any help, Nick North --20cf303b3cdf710ef604c012b640--