From dev-return-12560-apmail-couchdb-dev-archive=couchdb.apache.org@couchdb.apache.org Sat Oct 16 18:12:49 2010 Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 59077 invoked from network); 16 Oct 2010 18:12:48 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 16 Oct 2010 18:12:48 -0000 Received: (qmail 33851 invoked by uid 500); 16 Oct 2010 18:12:48 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 33549 invoked by uid 500); 16 Oct 2010 18:12:47 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 33541 invoked by uid 99); 16 Oct 2010 18:12:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 16 Oct 2010 18:12:47 +0000 X-ASF-Spam-Status: No, hits=-1996.4 required=10.0 tests=ALL_TRUSTED,FS_REPLICA X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 16 Oct 2010 18:12:44 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o9GICMAM025714 for ; Sat, 16 Oct 2010 18:12:23 GMT Message-ID: <7124113.3491287252742830.JavaMail.jira@thor> Date: Sat, 16 Oct 2010 14:12:22 -0400 (EDT) From: "Randall Leeds (JIRA)" To: dev@couchdb.apache.org Subject: [jira] Commented: (COUCHDB-704) Replication can lose checkpoints In-Reply-To: <1882272857.326301268865447764.JavaMail.jira@brutus.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/COUCHDB-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921723#action_12921723 ] Randall Leeds commented on COUCHDB-704: --------------------------------------- Filipe, It's true. This is an edge case, but I have had it happen in production with a database that had crawled to *very* slow writes and pull replication. The checkpoint code updated the source first and the local document was written, but the response was too slow so it was taken as a timeout. When the replicator retried the save it got a conflict. Replication crashed and the target was never written. I can imagine other, rare instances where this could occur. It's an edge case, but a potentially nasty one. > Replication can lose checkpoints > -------------------------------- > > Key: COUCHDB-704 > URL: https://issues.apache.org/jira/browse/COUCHDB-704 > Project: CouchDB > Issue Type: Bug > Components: Replication > Affects Versions: 0.11.2, 1.0.1 > Reporter: Randall Leeds > Priority: Minor > Attachments: keep_session_id.patch, save-all-rep-checkpoints.patch, whitespace.patch > > Original Estimate: 0h > Remaining Estimate: 0h > > When saving replication checkpoints in the _local/ document the new entry is always pushed onto the _original_ "history" list property that existed at the start of the replication. When any number of things causes the checkpoint to be written to only one of the databases the head of the history list gets out of sync. Subsequent attempts to start this replication must start from the latest common replication log entry in the _original_ history, as though this replication never occurred. > A better idea is to push every checkpoint onto the history instead of replacing the head on each save. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.