couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jiangph <>
Subject Re: [DISCUSS] soft-deletion
Date Thu, 19 Mar 2020 10:06:29 GMT
Thanks a lot to Nick for further suggestion. Please see below my embedded response about API

> On Mar 19, 2020, at 6:01 AM, Nick Vatamaniuc <> wrote:
> I like the API shape, and the implementation looks pretty small and
> easy so far. Bonus points for using the HCA to hopefully get some
> performance improvement from smaller keys overall. That was Paul's
> original idea all along I believe.
> Was wondering about a few minor API nits:
> * Maybe use `timestamp` instead of `deleted_when` since we used
> `timestamp` in the rest of the API description?

Yes, I am struggling about this field name, and changed it from `deleted_ts` to `deleted_when`.
keep consistent, `timestamp` is better than `deleted_when`. I like this.
> * Our db instances have unique a `uuid` (instance id) attribute
> internally, we just don't surface it in the API. So when we re-create
> a db with the same name it gets a new `uuid`. I could see using that
> to identify individual deleted db instances when we restore them, as
> opposed to using timestamps:  `/{db}/_restore/{DbUuid}`. However,
> because we don't already surface that attribute in the API it would be
> a bit more noise too... So I think that argues for keeping timestamp
> as the id, but thought I'd mentioned and see if others have thoughts
> on it anyway.

Uuid is another alternative for restore. However, I think that timestamp might give more hints
help users to restore database. It is with more meaning.  

> Concerning the backup implementation. I think that's still an option!
> In other words the soft deletion API can still be the same, and
> eventually, once we get backup implemented, soft-deleted instances
> could immediately (or transparently in the background) become backups.
> Users might just see an extra metadata rows in `_deleted_dbs_info`
> something like "backed to blobstore foo as 1 day ago .... So they know
> restoring it won't be a single transaction but might take a while.
> FDB backup does have a local `file://</absolute/path/to/base_dir>`
> option for URLs [1] so that might be useful in embedded scenarios. And
> someone has probably created some sort of local filesystem S3 shim (S0
> ;-) ) we could adapt perhaps....
> Cheers,
> -Nick
> [1]
> On Wed, Mar 18, 2020 at 5:06 PM Paul Davis <> wr=
> ote:
>> Alex,
>> The first con I see for that approach is that its not soft-deletion.
>> Its actual deletion with an API for restoration. Which, fair enough,
>> is probably a feature we should consider supporting for CouchDB
>> installations that are based on FoundationDB.
>> The second major con is that it relies on CouchDB being based on
>> FoundationDB. Part of CouchDB's design philosophy is that the internet
>> may or may not exist, and if it does exist that it may or may not be
>> reliable. There are lots of deployments of CouchDB that are part of a
>> desktop application or POS installation that may see internet only
>> periodically if at all so an S3 backup solution is out. There also may
>> come a time that there's a flavor of CouchDB that uses LevelDB or
>> SQLite or FDBLite (I just made that up, any idea how hard it'd be?)
>> for these sorts of embedded deployments where fdbrestore/fdbbackup
>> wouldn't be feasible.
>> Then the last major con I see is the time-to-restore disparity. With
>> soft-deletion restoration is a few milliseconds. Streaming from S3
>> will obviously depend on the size of the database and obviously be
>> orders of magnitude longer.
>> On the pro side for the soft-delete on FoundationDB is that the first
>> draft of the RFC is 108 lines [1]. We obviously can't say for sure how
>> big or involved the fdbrestore approach would be but I think we'd all
>> agree it'd be bigger.
>> Paul
>> [1]
>> On Wed, Mar 18, 2020 at 2:31 PM Alex Miller
>> <> wrote:
>>> Let me perhaps paint an alternative RFC:
>>> 1) `DELETE /{db}`
>>> If soft-deletion is enabled, delete the database subspace, and also rec=
> ord into ?DELETED_DBS the timestamp of the commit and the database subspace=
> prefix
>>> 2) `GET /{db}/_deleted_dbs_info`
>>> Return the timestamp (and whatever other info one should record) of del=
> eted databases.
>>> 3) `PUT /{db}/_restore/{deletedTS}`
>>> Invoke `fdbrestore -k` to do a key range restricted restore into the cu=
> rrent cluster of the deleted subspace prefix at versionstamp-1.  Wait for i=
> t to complete, and return 200 when completed.
>>> And this would all rely on having a continuous backup configured and ru=
> nning that would hold a minimum of 48 hours of changes.
>>> Now, I don=E2=80=99t actually deal with backups often so my memory on c=
> urrent caveats is a bit fuzzy.  I think there might be a couple complicatio=
> ns here that I=E2=80=99ve missed, like=E2=80=A6
>>> * There not being key range restricted locking of the database
>>> * A key range restore is currently suboptimal in that it doesn=E2=80=99=
> t do obvious filtering that it could to cut down on the amount of data it r=
> eads
>>> But, neither of these seem heavily blocking, as they could be tackled q=
> uickly, particularly if you leverage some upstream relationships ;).  Backu=
> p and restore has been the general answer to accidental data deletion (or c=
> orruption) on FDB, and I could paint some attractive looking pros of this a=
> pproach: backup files are more disk space efficient, soft deleted data coul=
> d be offloaded to an S3-compatible store, it would be free if FDB is alread=
> y configured to take backups.  I was just curious to hear a bit more detail=
> on your/Peng=E2=80=99s side of the reasons for preferring to build soft de=
> letion on top of FDB (and thus have also intentionally withheld more of the=
> cons of this approach, or the pros of yours).
>>>> On Mar 18, 2020, at 11:59, Paul Davis <>
> rote:
>>>> Alex,
>>>> All joking aside, soft-deletion's target use case is accidental
>>>> deletions. This isn't a replacement for backup/restore which will
>>>> still happen for all the usual reasons.
>>>> Paul
>>>> On Wed, Mar 18, 2020 at 1:42 PM Paul Davis <paul.joseph.davis@gmail.c=
> om> wrote:
>>>>> On Wed, Mar 18, 2020 at 1:29 PM Alex Miller
>>>>> <> wrote:
>>>>>>> On Mar 18, 2020, at 05:04, jiangph <>
> e:
>>>>>>> Instead of automatically and immediately removing data and index
> n database after a delete operation, soft-deletion allows to restore the de=
> leted data back to original state due to a =E2=80=9Cfat finger=E2=80=9Dor u=
> ndesired delete operation, up to defined periods, such as 48 hours.
>>>>>>> In CouchDB 3.0, soft-deletion of database is implemented in [1].
> he .couch file is renamed with the .<timestamp>.deleted.couch file after so=
> ft-deletion is enabled, and such file can be changed back to .couch for the=
> purpose of restore. If restore is not needed and some specified period pas=
> sed, the .<timestamp>.deleted.couch file can be deleted to achieve deletion=
> of database permanently.
>>>>>>> In CouchDB 4.0, with the introduction of FoundationDB, the data
> del and storage is changed. In order to support soft-deletion, we propose b=
> elow solution and then implement them.
>>>>>> I=E2=80=99ve sort of hand waved some answers to this in my head,
> t would you mind expanding a bit on the advantages of keeping soft-deleted =
> data in FoundationDB as opposed to actually deleting it and relying on Foun=
> dationDB=E2=80=99s backup and restore to recover it if needed?
>>>>> From: Panicked User
>>>>> To: Customer Support
>>>>> Dear,
>>>>> I have accidentally deleted my Very Important Database and need to
>>>>> have it restored ASAP! Without this mission critical database my
>>>>> company is completely offline which is costing $1B an hour!!!!!
>>>>> Please respond ASAP!
>>>>> Sincerely,
>>>>> Panicky McPanics

View raw message