From user-return-11119-apmail-couchdb-user-archive=couchdb.apache.org@couchdb.apache.org Thu Jun 24 14:05:41 2010 Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 32705 invoked from network); 24 Jun 2010 14:05:40 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 24 Jun 2010 14:05:40 -0000 Received: (qmail 19234 invoked by uid 500); 24 Jun 2010 14:05:39 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 19106 invoked by uid 500); 24 Jun 2010 14:05:38 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 19097 invoked by uid 99); 24 Jun 2010 14:05:38 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jun 2010 14:05:38 +0000 X-ASF-Spam-Status: No, hits=4.3 required=10.0 tests=FS_REPLICA,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.212.52] (HELO mail-vw0-f52.google.com) (209.85.212.52) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jun 2010 14:05:30 +0000 Received: by vws6 with SMTP id 6so3435734vws.11 for ; Thu, 24 Jun 2010 07:05:08 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.80.90 with SMTP id s26mr5213022vck.59.1277388306564; Thu, 24 Jun 2010 07:05:06 -0700 (PDT) Received: by 10.220.166.8 with HTTP; Thu, 24 Jun 2010 07:05:06 -0700 (PDT) Date: Thu, 24 Jun 2010 10:05:06 -0400 Message-ID: Subject: Replication Chatter / Recovery From: Cory Zue To: user Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Hi there, My team is designing a distributed health data capture system to be used in rural Africa, and we are planning to use CouchDB as a back end for it's excellent replication features. One concern I had was how the replication would perform over a very unreliable internet connection. Is replication done in pieces or does it require large amounts of data to make it through at a single time? If the connection goes down in the middle of replication is the result that you have to start over from the beginning or is it smart enough to recover what has already made it across the wire? Also, are there any numbers I can get on how chatty replication is? Our system will likely be deployed with post-paid SIM cards and GSM modems providing the internet connection in many sites, so I would like to be able to get a rough estimate of data usage. Is there any formula I could use, such as "syncing X bytes of data in couch causes K * X bytes to go over the wire (where K i some overhead amount)". Seeing how JSON probably compresses quite well, is there any way to do compressed synchronization? thanks in advance, Cory