From dev-return-49175-archive-asf-public=cust-asf.ponee.io@couchdb.apache.org Wed Mar 18 22:01:38 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id B088818025F for ; Wed, 18 Mar 2020 23:01:37 +0100 (CET) Received: (qmail 2546 invoked by uid 500); 18 Mar 2020 22:01:36 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 2534 invoked by uid 99); 18 Mar 2020 22:01:36 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Mar 2020 22:01:36 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 1D0FC182B95 for ; Wed, 18 Mar 2020 22:01:36 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.2 X-Spam-Level: X-Spam-Status: No, score=-0.2 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id Xr-YcjY-J_B9 for ; Wed, 18 Mar 2020 22:01:34 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.217.65; helo=mail-vs1-f65.google.com; envelope-from=vatamane@gmail.com; receiver= Received: from mail-vs1-f65.google.com (mail-vs1-f65.google.com [209.85.217.65]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 0F1F8BB853 for ; Wed, 18 Mar 2020 22:01:34 +0000 (UTC) Received: by mail-vs1-f65.google.com with SMTP id o3so270786vsd.4 for ; Wed, 18 Mar 2020 15:01:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-transfer-encoding; bh=hZjwiXGbp1PpNyrMgK96VUpGO5TtyiZFM9SFmSeGzoI=; b=O2CTNJTTfIuUOnqLbbl4voJw/dbzs/VDEgD5onpHB++bbNc8O057LL6ck67kwpOvxI AkXTh9+m14yRdBYksxJ614hZKCI9XIb4CXPvTpFjWTp14RAksrxwpDJKPi1gA8ZGTy7y PWectoLUW4VtKaH6+EqEh18x1OiitzGSys3fZF6+KJwksb3Ff8jvkeiNpxv8rwwNPb/6 a/cioUY9buEgu9WLbdhR6FUqmuLlXwvoD1ha83S7NYuACd5lfvm8Kii8bDXUxunbTFoB YhhOzuXRfj84axko7tT6KOo5t5nsPbuNc8Lee+CIz+vxSA6qoSVDzqj3BnehdeOAQ2Y4 WjBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=hZjwiXGbp1PpNyrMgK96VUpGO5TtyiZFM9SFmSeGzoI=; b=LJkiBF1ehsO8BUxmNzDMbRLBzHjrJF/PLixuC7Jn+bTTNub65rVpXiHpzC7Ujps9Is 1XCb3OP8XZ3V/e0Ie3Nxkuquw2LAlETczr6LmYD+v1VYGCht2nu7/WAJijLPzhQxuAjv 6eDY0eDu0kk6YJ/qlzmmij9G652idmPjIaX+A0OUJTziE6zexO7RQg1vAPV+17NXo8XR lLNjUpv6x11k6+2RAE6ROf7Zutkbv2oZ6Rw7ijPNaPbtmkMY2buk7Yp6uz7Oox+mXK3q awcPt4XCwkav8it0dRz1eonaF9xH12wCF8gsDTMJeTAgDOakOKvblLSPwaR4fxwrrC07 fjAw== X-Gm-Message-State: ANhLgQ2ix7BqNGnUQzKEdgwQjD6x50FePMwLGlIL2NgH0B7U5yFUDE6V RsWVonaz+v6iy1AqFgT5t2uFSwhk3c7aHdm9Eu9WrJXS X-Google-Smtp-Source: ADFU+vtEgcxKzTNXmCR3drYAD+6h6pkGW/e/Zbdb0W3+tVtn9LJ5hHqanr7/18fGzaOYtDga+ZgYjg3CCeIkEQsaZNc= X-Received: by 2002:a67:d707:: with SMTP id p7mr5316900vsj.0.1584568892974; Wed, 18 Mar 2020 15:01:32 -0700 (PDT) MIME-Version: 1.0 References: <905A4E67-FC8E-4D7F-B9FC-3847C656CCE9@apple.com> In-Reply-To: From: Nick Vatamaniuc Date: Wed, 18 Mar 2020 18:01:21 -0400 Message-ID: Subject: Re: [DISCUSS] soft-deletion To: dev@couchdb.apache.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I think it looks good, Peng Hui. Nice work! I like the API shape, and the implementation looks pretty small and easy so far. Bonus points for using the HCA to hopefully get some performance improvement from smaller keys overall. That was Paul's original idea all along I believe. Was wondering about a few minor API nits: * Maybe use `timestamp` instead of `deleted_when` since we used `timestamp` in the rest of the API description? * Our db instances have unique a `uuid` (instance id) attribute internally, we just don't surface it in the API. So when we re-create a db with the same name it gets a new `uuid`. I could see using that to identify individual deleted db instances when we restore them, as opposed to using timestamps: `/{db}/_restore/{DbUuid}`. However, because we don't already surface that attribute in the API it would be a bit more noise too... So I think that argues for keeping timestamp as the id, but thought I'd mentioned and see if others have thoughts on it anyway. Concerning the backup implementation. I think that's still an option! In other words the soft deletion API can still be the same, and eventually, once we get backup implemented, soft-deleted instances could immediately (or transparently in the background) become backups. Users might just see an extra metadata rows in `_deleted_dbs_info` something like "backed to blobstore foo as 1 day ago .... So they know restoring it won't be a single transaction but might take a while. FDB backup does have a local `file://` option for URLs [1] so that might be useful in embedded scenarios. And someone has probably created some sort of local filesystem S3 shim (S0 ;-) ) we could adapt perhaps.... Cheers, -Nick [1] https://apple.github.io/foundationdb/backups.html#backup-urls On Wed, Mar 18, 2020 at 5:06 PM Paul Davis wr= ote: > > Alex, > > The first con I see for that approach is that its not soft-deletion. > Its actual deletion with an API for restoration. Which, fair enough, > is probably a feature we should consider supporting for CouchDB > installations that are based on FoundationDB. > > The second major con is that it relies on CouchDB being based on > FoundationDB. Part of CouchDB's design philosophy is that the internet > may or may not exist, and if it does exist that it may or may not be > reliable. There are lots of deployments of CouchDB that are part of a > desktop application or POS installation that may see internet only > periodically if at all so an S3 backup solution is out. There also may > come a time that there's a flavor of CouchDB that uses LevelDB or > SQLite or FDBLite (I just made that up, any idea how hard it'd be?) > for these sorts of embedded deployments where fdbrestore/fdbbackup > wouldn't be feasible. > > Then the last major con I see is the time-to-restore disparity. With > soft-deletion restoration is a few milliseconds. Streaming from S3 > will obviously depend on the size of the database and obviously be > orders of magnitude longer. > > On the pro side for the soft-delete on FoundationDB is that the first > draft of the RFC is 108 lines [1]. We obviously can't say for sure how > big or involved the fdbrestore approach would be but I think we'd all > agree it'd be bigger. > > Paul > > [1] https://github.com/apache/couchdb/pull/2666 > > > On Wed, Mar 18, 2020 at 2:31 PM Alex Miller > wrote: > > > > Let me perhaps paint an alternative RFC: > > > > 1) `DELETE /{db}` > > > > If soft-deletion is enabled, delete the database subspace, and also rec= ord into ?DELETED_DBS the timestamp of the commit and the database subspace= prefix > > > > 2) `GET /{db}/_deleted_dbs_info` > > > > Return the timestamp (and whatever other info one should record) of del= eted databases. > > > > 3) `PUT /{db}/_restore/{deletedTS}` > > > > Invoke `fdbrestore -k` to do a key range restricted restore into the cu= rrent cluster of the deleted subspace prefix at versionstamp-1. Wait for i= t to complete, and return 200 when completed. > > > > And this would all rely on having a continuous backup configured and ru= nning that would hold a minimum of 48 hours of changes. > > > > > > Now, I don=E2=80=99t actually deal with backups often so my memory on c= urrent caveats is a bit fuzzy. I think there might be a couple complicatio= ns here that I=E2=80=99ve missed, like=E2=80=A6 > > * There not being key range restricted locking of the database > > * A key range restore is currently suboptimal in that it doesn=E2=80=99= t do obvious filtering that it could to cut down on the amount of data it r= eads > > > > But, neither of these seem heavily blocking, as they could be tackled q= uickly, particularly if you leverage some upstream relationships ;). Backu= p and restore has been the general answer to accidental data deletion (or c= orruption) on FDB, and I could paint some attractive looking pros of this a= pproach: backup files are more disk space efficient, soft deleted data coul= d be offloaded to an S3-compatible store, it would be free if FDB is alread= y configured to take backups. I was just curious to hear a bit more detail= on your/Peng=E2=80=99s side of the reasons for preferring to build soft de= letion on top of FDB (and thus have also intentionally withheld more of the= cons of this approach, or the pros of yours). > > > > > On Mar 18, 2020, at 11:59, Paul Davis w= rote: > > > > > > Alex, > > > > > > All joking aside, soft-deletion's target use case is accidental > > > deletions. This isn't a replacement for backup/restore which will > > > still happen for all the usual reasons. > > > > > > Paul > > > > > > On Wed, Mar 18, 2020 at 1:42 PM Paul Davis wrote: > > >> > > >> On Wed, Mar 18, 2020 at 1:29 PM Alex Miller > > >> wrote: > > >>> > > >>> > > >>>> On Mar 18, 2020, at 05:04, jiangph wrot= e: > > >>>> > > >>>> Instead of automatically and immediately removing data and index i= n database after a delete operation, soft-deletion allows to restore the de= leted data back to original state due to a =E2=80=9Cfat finger=E2=80=9Dor u= ndesired delete operation, up to defined periods, such as 48 hours. > > >>>> > > >>>> In CouchDB 3.0, soft-deletion of database is implemented in [1]. T= he .couch file is renamed with the ..deleted.couch file after so= ft-deletion is enabled, and such file can be changed back to .couch for the= purpose of restore. If restore is not needed and some specified period pas= sed, the ..deleted.couch file can be deleted to achieve deletion= of database permanently. > > >>>> > > >>>> In CouchDB 4.0, with the introduction of FoundationDB, the data mo= del and storage is changed. In order to support soft-deletion, we propose b= elow solution and then implement them. > > >>> > > >>> > > >>> > > >>> I=E2=80=99ve sort of hand waved some answers to this in my head, bu= t would you mind expanding a bit on the advantages of keeping soft-deleted = data in FoundationDB as opposed to actually deleting it and relying on Foun= dationDB=E2=80=99s backup and restore to recover it if needed? > > >> > > >> From: Panicked User > > >> To: Customer Support > > >> Subject: URGENT! EMERGENCY DATABASE RESTORE! > > >> > > >> Dear, > > >> > > >> I have accidentally deleted my Very Important Database and need to > > >> have it restored ASAP! Without this mission critical database my > > >> company is completely offline which is costing $1B an hour!!!!! > > >> > > >> Please respond ASAP! > > >> > > >> Sincerely, > > >> Panicky McPanics > >