couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jiangph <jiangpeng...@hotmail.com>
Subject Re: [DISCUSS] soft-deletion
Date Thu, 19 Mar 2020 10:06:29 GMT
Thanks a lot to Nick for further suggestion. Please see below my embedded response about API
nits.


> On Mar 19, 2020, at 6:01 AM, Nick Vatamaniuc <vatamane@gmail.com> wrote:
> I like the API shape, and the implementation looks pretty small and
> easy so far. Bonus points for using the HCA to hopefully get some
> performance improvement from smaller keys overall. That was Paul's
> original idea all along I believe.
> 
> Was wondering about a few minor API nits:
> 
> * Maybe use `timestamp` instead of `deleted_when` since we used
> `timestamp` in the rest of the API description?

Yes, I am struggling about this field name, and changed it from `deleted_ts` to `deleted_when`.
To 
keep consistent, `timestamp` is better than `deleted_when`. I like this.
> 
> * Our db instances have unique a `uuid` (instance id) attribute
> internally, we just don't surface it in the API. So when we re-create
> a db with the same name it gets a new `uuid`. I could see using that
> to identify individual deleted db instances when we restore them, as
> opposed to using timestamps:  `/{db}/_restore/{DbUuid}`. However,
> because we don't already surface that attribute in the API it would be
> a bit more noise too... So I think that argues for keeping timestamp
> as the id, but thought I'd mentioned and see if others have thoughts
> on it anyway.

Uuid is another alternative for restore. However, I think that timestamp might give more hints
to
help users to restore database. It is with more meaning.  

> 
> Concerning the backup implementation. I think that's still an option!
> In other words the soft deletion API can still be the same, and
> eventually, once we get backup implemented, soft-deleted instances
> could immediately (or transparently in the background) become backups.
> Users might just see an extra metadata rows in `_deleted_dbs_info`
> something like "backed to blobstore foo as 1 day ago .... So they know
> restoring it won't be a single transaction but might take a while.
> 
> FDB backup does have a local `file://</absolute/path/to/base_dir>`
> option for URLs [1] so that might be useful in embedded scenarios. And
> someone has probably created some sort of local filesystem S3 shim (S0
> ;-) ) we could adapt perhaps....
> 
> Cheers,
> -Nick
> 
> [1] https://apple.github.io/foundationdb/backups.html#backup-urls
> 
> On Wed, Mar 18, 2020 at 5:06 PM Paul Davis <paul.joseph.davis@gmail.com> wr=
> ote:
>> 
>> Alex,
>> 
>> The first con I see for that approach is that its not soft-deletion.
>> Its actual deletion with an API for restoration. Which, fair enough,
>> is probably a feature we should consider supporting for CouchDB
>> installations that are based on FoundationDB.
>> 
>> The second major con is that it relies on CouchDB being based on
>> FoundationDB. Part of CouchDB's design philosophy is that the internet
>> may or may not exist, and if it does exist that it may or may not be
>> reliable. There are lots of deployments of CouchDB that are part of a
>> desktop application or POS installation that may see internet only
>> periodically if at all so an S3 backup solution is out. There also may
>> come a time that there's a flavor of CouchDB that uses LevelDB or
>> SQLite or FDBLite (I just made that up, any idea how hard it'd be?)
>> for these sorts of embedded deployments where fdbrestore/fdbbackup
>> wouldn't be feasible.
>> 
>> Then the last major con I see is the time-to-restore disparity. With
>> soft-deletion restoration is a few milliseconds. Streaming from S3
>> will obviously depend on the size of the database and obviously be
>> orders of magnitude longer.
>> 
>> On the pro side for the soft-delete on FoundationDB is that the first
>> draft of the RFC is 108 lines [1]. We obviously can't say for sure how
>> big or involved the fdbrestore approach would be but I think we'd all
>> agree it'd be bigger.
>> 
>> Paul
>> 
>> [1] https://github.com/apache/couchdb/pull/2666
>> 
>> 
>> On Wed, Mar 18, 2020 at 2:31 PM Alex Miller
>> <alexmiller@apple.com.invalid> wrote:
>>> 
>>> Let me perhaps paint an alternative RFC:
>>> 
>>> 1) `DELETE /{db}`
>>> 
>>> If soft-deletion is enabled, delete the database subspace, and also rec=
> ord into ?DELETED_DBS the timestamp of the commit and the database subspace=
> prefix
>>> 
>>> 2) `GET /{db}/_deleted_dbs_info`
>>> 
>>> Return the timestamp (and whatever other info one should record) of del=
> eted databases.
>>> 
>>> 3) `PUT /{db}/_restore/{deletedTS}`
>>> 
>>> Invoke `fdbrestore -k` to do a key range restricted restore into the cu=
> rrent cluster of the deleted subspace prefix at versionstamp-1.  Wait for i=
> t to complete, and return 200 when completed.
>>> 
>>> And this would all rely on having a continuous backup configured and ru=
> nning that would hold a minimum of 48 hours of changes.
>>> 
>>> 
>>> Now, I don=E2=80=99t actually deal with backups often so my memory on c=
> urrent caveats is a bit fuzzy.  I think there might be a couple complicatio=
> ns here that I=E2=80=99ve missed, like=E2=80=A6
>>> * There not being key range restricted locking of the database
>>> * A key range restore is currently suboptimal in that it doesn=E2=80=99=
> t do obvious filtering that it could to cut down on the amount of data it r=
> eads
>>> 
>>> But, neither of these seem heavily blocking, as they could be tackled q=
> uickly, particularly if you leverage some upstream relationships ;).  Backu=
> p and restore has been the general answer to accidental data deletion (or c=
> orruption) on FDB, and I could paint some attractive looking pros of this a=
> pproach: backup files are more disk space efficient, soft deleted data coul=
> d be offloaded to an S3-compatible store, it would be free if FDB is alread=
> y configured to take backups.  I was just curious to hear a bit more detail=
> on your/Peng=E2=80=99s side of the reasons for preferring to build soft de=
> letion on top of FDB (and thus have also intentionally withheld more of the=
> cons of this approach, or the pros of yours).
>>> 
>>>> On Mar 18, 2020, at 11:59, Paul Davis <paul.joseph.davis@gmail.com>
w=
> rote:
>>>> 
>>>> Alex,
>>>> 
>>>> All joking aside, soft-deletion's target use case is accidental
>>>> deletions. This isn't a replacement for backup/restore which will
>>>> still happen for all the usual reasons.
>>>> 
>>>> Paul
>>>> 
>>>> On Wed, Mar 18, 2020 at 1:42 PM Paul Davis <paul.joseph.davis@gmail.c=
> om> wrote:
>>>>> 
>>>>> On Wed, Mar 18, 2020 at 1:29 PM Alex Miller
>>>>> <alexmiller@apple.com.invalid> wrote:
>>>>>> 
>>>>>> 
>>>>>>> On Mar 18, 2020, at 05:04, jiangph <jiangpenghui@hotmail.com>
wrot=
> e:
>>>>>>> 
>>>>>>> Instead of automatically and immediately removing data and index
i=
> n database after a delete operation, soft-deletion allows to restore the de=
> leted data back to original state due to a =E2=80=9Cfat finger=E2=80=9Dor u=
> ndesired delete operation, up to defined periods, such as 48 hours.
>>>>>>> 
>>>>>>> In CouchDB 3.0, soft-deletion of database is implemented in [1].
T=
> he .couch file is renamed with the .<timestamp>.deleted.couch file after so=
> ft-deletion is enabled, and such file can be changed back to .couch for the=
> purpose of restore. If restore is not needed and some specified period pas=
> sed, the .<timestamp>.deleted.couch file can be deleted to achieve deletion=
> of database permanently.
>>>>>>> 
>>>>>>> In CouchDB 4.0, with the introduction of FoundationDB, the data
mo=
> del and storage is changed. In order to support soft-deletion, we propose b=
> elow solution and then implement them.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> I=E2=80=99ve sort of hand waved some answers to this in my head,
bu=
> t would you mind expanding a bit on the advantages of keeping soft-deleted =
> data in FoundationDB as opposed to actually deleting it and relying on Foun=
> dationDB=E2=80=99s backup and restore to recover it if needed?
>>>>> 
>>>>> From: Panicked User
>>>>> To: Customer Support
>>>>> Subject: URGENT! EMERGENCY DATABASE RESTORE!
>>>>> 
>>>>> Dear,
>>>>> 
>>>>> I have accidentally deleted my Very Important Database and need to
>>>>> have it restored ASAP! Without this mission critical database my
>>>>> company is completely offline which is costing $1B an hour!!!!!
>>>>> 
>>>>> Please respond ASAP!
>>>>> 
>>>>> Sincerely,
>>>>> Panicky McPanics
>>> 


Mime
View raw message