Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 54DC51044D for ; Tue, 23 Apr 2013 08:29:17 +0000 (UTC) Received: (qmail 20764 invoked by uid 500); 23 Apr 2013 08:29:15 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 20619 invoked by uid 500); 23 Apr 2013 08:29:13 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 20581 invoked by uid 99); 23 Apr 2013 08:29:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Apr 2013 08:29:12 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dch@jsonified.com designates 209.85.217.179 as permitted sender) Received: from [209.85.217.179] (HELO mail-lb0-f179.google.com) (209.85.217.179) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Apr 2013 08:29:07 +0000 Received: by mail-lb0-f179.google.com with SMTP id t1so402903lbd.38 for ; Tue, 23 Apr 2013 01:28:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jsonified.com; s=google; h=mime-version:x-received:x-originating-ip:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=yfHvr1fIX1n1UkZHYx0Sc7BcSSEy4dRDRBU/uYBCsSE=; b=VXghFNLRSFnsr3PEJ+DUJBjToN2Y7ARfhTlqjWc17C9JoNDRoS12mBDxzgrLR5/cg2 e/nNUboMexrUNZSoyT9fQPxyNb9k51v7xHgIxmWVPv412t8LMNiWZyPCqn1AyNKiGWIM W0fiQ7s6X/gdMaWGjkp+bw95X+m7ifDlv78l4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:x-originating-ip:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding:x-gm-message-state; bh=yfHvr1fIX1n1UkZHYx0Sc7BcSSEy4dRDRBU/uYBCsSE=; b=nWA7j6RRJ518vcIa7vnvgo+Zd5GaKq1NWqXiz6omoFYnNMPGHyqA9JwMBtGIM90Urn GJYoeZFgLAiwg9WkeOO6VIjZGriLhc3IdzHDRO70KCjiV7e1tS8n/ppA9XMwiB9Wls+I pUYPWKyyeU+Criz1P6xOCuGbaKI9seCMxFH37HyNs7S4hPnms5aOMMMTCySi9KDoPZzZ Io7Xq+mo8PUmNnlhvQaHUj2pecuTj4ngm2UzJFaENM9Rv9wrpYHc0bczTgMav4BWfTZm Impiq8z7EYt8UpIS1jdM85r0Aw23g1/zqQV/W9IHLLXVx2rvCs+fau5vfAFdc/HSKsG/ ujdQ== MIME-Version: 1.0 X-Received: by 10.112.199.194 with SMTP id jm2mr14273149lbc.21.1366705725842; Tue, 23 Apr 2013 01:28:45 -0700 (PDT) Received: by 10.112.149.73 with HTTP; Tue, 23 Apr 2013 01:28:45 -0700 (PDT) X-Originating-IP: [84.112.19.176] In-Reply-To: <1366700501.8426.140661221479698.0FD262C1@webmail.messagingengine.com> References: <1366700501.8426.140661221479698.0FD262C1@webmail.messagingengine.com> Date: Tue, 23 Apr 2013 10:28:45 +0200 Message-ID: Subject: Re: Problem with restart of continuous replication From: Dave Cottlehuber To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQmVs9Mf9NrWWs9CnkEO4kx+HL7nnSdRE3ycPZMNdLuDjRb2Sz5m6ASaqFB3Vb3N7EdHhz+f X-Virus-Checked: Checked by ClamAV on apache.org On 23 April 2013 09:01, Calle Arnesten wrote: > Hi, > > I have a two-way continuous replication set up between two servers with C= ouchDB. About a week ago we had a DNS problem that caused the machines to n= ot to be able to communicate for almost 2 hours. My assumption was that Cou= chDB would automatically try to restart the replication. When I double-chec= ked this a day or two later, replication was only restarted for one way and= not both. After rebooting the machine the replication started working both= ways again. > > Now my question is: Was my assumption wrong that replication should be re= started automatically? If not, what could be the cause of it not working? T= he machine that it happened to have CouchDB 1.2.1 installed. replication will retry a certain number of times and then give up if the connection does not work, 10 by default. See http://wiki.apache.org/couchdb/Replication#New_features_introduced_in_C= ouchDB_1.2.0 for the details. > Also, do you have any recommendations on how to monitor that replication = works and notify by mail when it doesn't? I'd suggest one of 2 things, not having used either personally - use the replicator DB mentioned in the above link, and monitor the state of that named document: GET /_replicator/my_replication_task You'll get something back like this: { "_id": "my_replication_task", "_rev": "2-539eca65da0cfbd416fb335dcc5129c2", "source": "http://somewhere.iriscouch.com/mydb/", "target": "mydb", "create_target": true, "continuous": true, "owner": "admin", "_replication_state": "error", "_replication_state_time": "2013-04-22T18:31:44+02:00", "_replication_id": "f1bdf920b49bb9156b2c3147f30d18c3" } Which seems self-explanatory. The other option is to GET /_active_tasks [ { "checkpointed_source_seq": 307493, "continuous": true, "doc_id": "my_replication_task", "doc_write_failures": 0, "docs_read": 0, "docs_written": 0, "missing_revisions_found": 0, "pid": "<0.300.0>", "progress": 70, "replication_id": "1d1f73e98957aa03e3fa9291e05ed7e0+continuous", "revisions_checked": 3962, "source": http://somewhere.iriscouch.com/mydb/", "source_seq": 461843, "started_on": 1366705545, "target": "mydb", "type": "replication", "updated_on": 1366705552 } ] Which for a continuous replication should always be present. A+ Dave