Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 397CC10192 for ; Mon, 17 Feb 2014 19:05:42 +0000 (UTC) Received: (qmail 54745 invoked by uid 500); 17 Feb 2014 19:05:37 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 54668 invoked by uid 500); 17 Feb 2014 19:05:34 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 54553 invoked by uid 99); 17 Feb 2014 19:05:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Feb 2014 19:05:28 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of paul.joseph.davis@gmail.com designates 209.85.214.175 as permitted sender) Received: from [209.85.214.175] (HELO mail-ob0-f175.google.com) (209.85.214.175) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Feb 2014 19:05:24 +0000 Received: by mail-ob0-f175.google.com with SMTP id wn1so17231038obc.6 for ; Mon, 17 Feb 2014 11:05:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; bh=4FBKMMdzHhAy+PQLcFslnrsCAOF2D/psibnFfQzyKNg=; b=lUXeQfxuNXryfcM7xKiPotdqP9eE3hEiyuHB2kkgnQeYhbf+4e5WQJwng2yI9YZrms y7uH3zI40leAs53vNAFppJjzHwU3kmeQRJmfTlFgfKQS2MWqQyVs06tB3LkPrg4yW0jZ x6+gOtp28i+uY2JBYrRui0XBMN20UFIPdQRV0fR+AL3V0+HNF7QA0VFHhpN/7msrLvns h4dZiSEg6fn3+XiDDMVM1NvP+Qcf2jp7pU18jhx0foixbihHl5Zc7rTCBLkO1lqmsreQ GHHm7tJ2VHGMknl4MZQCptaL8AMnb3+uNvL2vhd2WQm2JIVInunkiulmAv8z3yODr9MK TQlg== X-Received: by 10.60.165.72 with SMTP id yw8mr2195881oeb.71.1392663903970; Mon, 17 Feb 2014 11:05:03 -0800 (PST) MIME-Version: 1.0 Received: by 10.60.162.36 with HTTP; Mon, 17 Feb 2014 11:04:23 -0800 (PST) In-Reply-To: <08224CCB-5CDF-46D5-9678-F2C59516AE4E@apache.org> References: <4AD73424-F7BB-49F4-ABC9-43A88D34BDCA@couchbase.com> <82D0792E-80AA-452A-9408-9FA648514B7F@couchbase.com> <0FE12D96-ECA4-4D29-A3F1-235A22E35BBB@apache.org> <08224CCB-5CDF-46D5-9678-F2C59516AE4E@apache.org> From: Paul Davis Date: Mon, 17 Feb 2014 13:04:23 -0600 Message-ID: Subject: Re: Replication vs. Compaction To: "user@couchdb.apache.org" Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Typo'ed: "so it's unbalancing as much" should read: "so it's *not* unbalancing as much" On Mon, Feb 17, 2014 at 12:49 PM, Robert Samuel Newson wrote: > Replication will not rebalance the tree, no. It's just adding to the end = (and unbalancing the tree). > > The updates are happening in batches, though, so it's unbalancing as much= as the original individual updates did. > > B. > > On 17 Feb 2014, at 16:16, Boaz Citrin wrote: > >> Did some more testing. seems like indeed compaction is faster than repli= cation. >> >> One thing I observe though is that replication doesn't result the same >> as compaction; >> While it only copies the leaves, I suspect it doesn't produce a >> balanced tree, so subsequent compaction is needed anyway (and indeed >> cuts the file size big time). >> >> Am I wrong here? >> >> On Mon, Feb 3, 2014 at 8:28 PM, Adam Kocoloski wro= te: >>> On Jan 31, 2014, at 3:43 PM, Jens Alfke wrote: >>> >>>> On Jan 31, 2014, at 12:07 PM, Boaz Citrin wrote: >>>> >>>>> But if replication only copies the leaf then it makes sense that it i= s >>>>> fatser, at least on the same machine. Instead of balancing a tree it = just >>>>> copies a single revision. >>>> >>>> Um, no. The copied revision has to be inserted into the tree on the ta= rget database. Worse, the target database is assumed to be 'live' during th= e whole process, so its tree can't be updated as efficiently as during a re= plication, where the new database file isn't going to be used at all until = the whole procedure finishes. >>>> >>>> Sorry to pull rank, but while I haven't worked on CouchDB itself, I've= written 1 1/2 CouchDB-compatible replicators, and I've worked on a C-based= compactor for CouchDB-format b-tree files. I'm pretty sure that compaction= is a lot faster. There's just much less work that it has to do. >>>> >>>> I agree with Jason that you probably need a faster server (or disk) th= at will let you compact effectively. >>>> >>>> --Jens >>> >>> Agreed, and also worth pointing out that we've developed a compactor th= at is far more efficient than the one in master. It uses less I/O and gene= rates a smaller file to boot: >>> >>> https://git-wip-us.apache.org/repos/asf?p=3Dcouchdb-couch.git;a=3Dcommi= t;h=3D5d3753d0662cfa676fdf65d0a543be205499ec11 >>> >>> Hopefully we can land it soon. Regards, >>> >>> Adam >