Return-Path: X-Original-To: apmail-couchdb-replication-archive@minotaur.apache.org Delivered-To: apmail-couchdb-replication-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 16BD311554 for ; Tue, 15 Apr 2014 18:18:31 +0000 (UTC) Received: (qmail 7820 invoked by uid 500); 15 Apr 2014 18:18:30 -0000 Delivered-To: apmail-couchdb-replication-archive@couchdb.apache.org Received: (qmail 7785 invoked by uid 500); 15 Apr 2014 18:18:30 -0000 Mailing-List: contact replication-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: replication@couchdb.apache.org Delivered-To: mailing list replication@couchdb.apache.org Received: (qmail 7777 invoked by uid 99); 15 Apr 2014 18:18:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Apr 2014 18:18:29 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of calvin.metcalf@gmail.com designates 209.85.220.173 as permitted sender) Received: from [209.85.220.173] (HELO mail-vc0-f173.google.com) (209.85.220.173) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Apr 2014 18:18:25 +0000 Received: by mail-vc0-f173.google.com with SMTP id il7so9820556vcb.32 for ; Tue, 15 Apr 2014 11:18:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=qs8OvJRw1WuNQSWAMQgP2+Zb3s6+culzapoyawK/aBs=; b=TPlwSYbOF74Y2I5rbIe7QVB7AyOMNyOG4LSmJPaD7B9Wa4uAujwc+MaG0oAr1mvK6t iP+ErtBFG3ZjP7WfR7AiSCkKgojNHGaU1OKHK3I1GfA+Xhp57m7Zr6IRcV/2rZbGM045 QpHuV1ZiZL/rdOw8gawM3uE4tZA8JW2OuutNvRY2iAUrh4iVTMniKZ9gDqL/PrcObPRS fvxvJZRaE5q149qBF11iegCE81xFzWrn//iscJezeoGcl6gnrf9jh1lKlqxvJFbWffsW WxQfiSvl1uHdHTcciEjqMUanY5x9t4GtazqCgBzFFePFXmYtj4ZVW0SOEyMDpgeFTFlv /pZw== MIME-Version: 1.0 X-Received: by 10.52.15.132 with SMTP id x4mr2007451vdc.31.1397585885078; Tue, 15 Apr 2014 11:18:05 -0700 (PDT) Received: by 10.220.130.129 with HTTP; Tue, 15 Apr 2014 11:18:04 -0700 (PDT) In-Reply-To: References: <1D7FB7D0-C52C-4A88-AAA1-83395B28785A@gmail.com> <24F452CE-2B62-41D6-9F53-59ECF4C3C7C5@gmail.com> Date: Tue, 15 Apr 2014 14:18:04 -0400 Message-ID: Subject: Re: Checkpointing on read only databases From: Calvin Metcalf To: replication@couchdb.apache.org Content-Type: multipart/alternative; boundary=20cf30334741392aad04f718d259 X-Virus-Checked: Checked by ClamAV on apache.org --20cf30334741392aad04f718d259 Content-Type: text/plain; charset=UTF-8 except there is no way to calculate that from outside the database as changes only ever gives the more recent document version. On Sun, Apr 13, 2014 at 9:47 PM, Calvin Metcalf wrote: > oo didn't think of that, yeah uuids wouldn't hurt, though the more I think > about the rolling hashing on revs, the more I like that > > > On Sun, Apr 13, 2014 at 6:00 PM, Adam Kocoloski wrote: > >> Yes, but then sysadmins have to be very very careful about restoring from >> a file-based backup. We run the risk that {uuid, seq} could be >> multi-valued, which diminishes its value considerably. >> >> I like the UUID in general -- we've added them to our internal shard >> files at Cloudant -- but on their own they're not a bulletproof solution >> for read-only incremental replications. >> >> Adam >> >> > On Apr 13, 2014, at 5:16 PM, Calvin Metcalf >> wrote: >> > >> > I mean if your going to add new features to couch you could just have >> the >> > db generate a random uuid on creation that would be different if it was >> > deleted and recreated >> >> On Apr 13, 2014 1:59 PM, "Adam Kocoloski" >> wrote: >> >> >> >> Other thoughts: >> >> >> >> - We could enhance the authorization system to have a role that allows >> >> updates to _local docs but nothing else. It wouldn't make sense for >> >> completely untrusted peers, but it could give peace of mind to >> sysadmins >> >> trying to execute replications with the minimum level of access >> possible. >> >> >> >> - We could teach the sequence index to maintain a report of rolling >> hash >> >> of the {id,rev} pairs that comprise the database up to that sequence, >> >> record that in the replication checkpoint document, and check that it's >> >> unchanged on resume. It's a new API enhancement and it grows the >> amount of >> >> information stored with each sequence, but it completely closes off the >> >> probabilistic edge case associated with simply checking that the {id, >> rev} >> >> associated with the checkpointed sequence has not changed. Perhaps >> overkill >> >> for what is admittedly a pretty low-probability event. >> >> >> >> Adam >> >> >> >> On Apr 13, 2014, at 1:50 PM, Adam Kocoloski >> >> wrote: >> >> >> >>> Yeah, this is a subtle little thing. The main reason we checkpoint on >> >> both source and target and compare is to cover the case where the >> source >> >> database is deleted and recreated in between replication attempts. If >> that >> >> were to happen and the replicator just resumes blindly from the >> checkpoint >> >> sequence stored on the target then the replication could permanently >> miss >> >> some documents written to the new source. >> >>> >> >>> I'd love to have a robust solution for incremental replication of >> >> read-only databases. To first order a UUID on the source database that >> was >> >> fixed at create time could do the trick, but we'll run into trouble >> with >> >> file-based backup and restores. If a database file is restored to a >> point >> >> before the latest replication checkpoint we'd again be in a position of >> >> potentially permanently missing updates. >> >>> >> >>> Calvin's suggestion of storing e.g. {seq, id, rev} instead of simply >> seq >> >> as the checkpoint information would dramatically reduce the likelihood >> of >> >> that type of permanent skip in the replication, but it's only a >> >> probabilistic answer. >> >>> >> >>> Adam >> >>> >> >>>> On Apr 13, 2014, at 1:31 PM, Calvin Metcalf < >> calvin.metcalf@gmail.com> >> >>> wrote: >> >>> >> >>>> Though currently we have the opposite problem right if we delete the >> >> target >> >>>> db? (this on me brain storming) >> >>>> >> >>>> Could we store last rev in addition to last seq? >> >>>>> On Apr 13, 2014 1:15 PM, "Dale Harvey" wrote: >> >>>>> >> >>>>> If the src database was to be wiped, when we restarted replication >> >> nothing >> >>>>> would happen until the source database caught up to the previously >> >> written >> >>>>> checkpoint >> >>>>> >> >>>>> create A, write 5 documents >> >>>>> replicate 5 documents A -> B, write checkpoint 5 on B >> >>>>> destroy A >> >>>>> write 4 documents >> >>>>> replicate A -> B, pick up checkpoint from B and to ?since=5 >> >>>>> .. no documents written >> >> >> https://github.com/pouchdb/pouchdb/blob/master/tests/test.replication.js#L771is >> >>>>> our test that covers it >> >>>>> >> >>>>> >> >>>>> On 13 April 2014 18:02, Calvin Metcalf >> >> wrote: >> >>>>> >> >>>>>> If we were to unilaterally switch to checkpoint on target what >> would >> >>>>>> happen, replication in progress would loose their place? >> >>>>>>> On Apr 13, 2014 11:21 AM, "Dale Harvey" >> wrote: >> >>>>>>> >> >>>>>>> So with checkpointing we write the checkpoint to both A and B and >> >>>>> verify >> >>>>>>> they match before using the checkpoint >> >>>>>>> >> >>>>>>> What happens if the src of the replication is read only? >> >>>>>>> >> >>>>>>> As far as I can tell couch will just checkout a >> >> checkpoint_commit_error >> >>>>>> and >> >>>>>>> carry on from the start, The only improvement I can think of is >> the >> >>>>> user >> >>>>>>> specifies they know the src is read only and to only use the >> target >> >>>>>>> checkpoint, we can 'possibly' make that happen automatically if >> the >> >> src >> >>>>>>> specifically fails the write due to permissions. >> >> >> >> >> > > > > -- > -Calvin W. Metcalf > -- -Calvin W. Metcalf --20cf30334741392aad04f718d259--