couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Vatamaniuc <vatam...@gmail.com>
Subject Re: [DISCUSS] soft-deletion
Date Wed, 18 Mar 2020 22:01:21 GMT
I think it looks good, Peng Hui. Nice work!

I like the API shape, and the implementation looks pretty small and
easy so far. Bonus points for using the HCA to hopefully get some
performance improvement from smaller keys overall. That was Paul's
original idea all along I believe.

Was wondering about a few minor API nits:

 * Maybe use `timestamp` instead of `deleted_when` since we used
`timestamp` in the rest of the API description?

 * Our db instances have unique a `uuid` (instance id) attribute
internally, we just don't surface it in the API. So when we re-create
a db with the same name it gets a new `uuid`. I could see using that
to identify individual deleted db instances when we restore them, as
opposed to using timestamps:  `/{db}/_restore/{DbUuid}`. However,
because we don't already surface that attribute in the API it would be
a bit more noise too... So I think that argues for keeping timestamp
as the id, but thought I'd mentioned and see if others have thoughts
on it anyway.

Concerning the backup implementation. I think that's still an option!
In other words the soft deletion API can still be the same, and
eventually, once we get backup implemented, soft-deleted instances
could immediately (or transparently in the background) become backups.
Users might just see an extra metadata rows in `_deleted_dbs_info`
something like "backed to blobstore foo as 1 day ago .... So they know
restoring it won't be a single transaction but might take a while.

FDB backup does have a local `file://</absolute/path/to/base_dir>`
option for URLs [1] so that might be useful in embedded scenarios. And
someone has probably created some sort of local filesystem S3 shim (S0
;-) ) we could adapt perhaps....

Cheers,
-Nick

[1] https://apple.github.io/foundationdb/backups.html#backup-urls

On Wed, Mar 18, 2020 at 5:06 PM Paul Davis <paul.joseph.davis@gmail.com> wrote:
>
> Alex,
>
> The first con I see for that approach is that its not soft-deletion.
> Its actual deletion with an API for restoration. Which, fair enough,
> is probably a feature we should consider supporting for CouchDB
> installations that are based on FoundationDB.
>
> The second major con is that it relies on CouchDB being based on
> FoundationDB. Part of CouchDB's design philosophy is that the internet
> may or may not exist, and if it does exist that it may or may not be
> reliable. There are lots of deployments of CouchDB that are part of a
> desktop application or POS installation that may see internet only
> periodically if at all so an S3 backup solution is out. There also may
> come a time that there's a flavor of CouchDB that uses LevelDB or
> SQLite or FDBLite (I just made that up, any idea how hard it'd be?)
> for these sorts of embedded deployments where fdbrestore/fdbbackup
> wouldn't be feasible.
>
> Then the last major con I see is the time-to-restore disparity. With
> soft-deletion restoration is a few milliseconds. Streaming from S3
> will obviously depend on the size of the database and obviously be
> orders of magnitude longer.
>
> On the pro side for the soft-delete on FoundationDB is that the first
> draft of the RFC is 108 lines [1]. We obviously can't say for sure how
> big or involved the fdbrestore approach would be but I think we'd all
> agree it'd be bigger.
>
> Paul
>
> [1] https://github.com/apache/couchdb/pull/2666
>
>
> On Wed, Mar 18, 2020 at 2:31 PM Alex Miller
> <alexmiller@apple.com.invalid> wrote:
> >
> > Let me perhaps paint an alternative RFC:
> >
> > 1) `DELETE /{db}`
> >
> > If soft-deletion is enabled, delete the database subspace, and also record into
?DELETED_DBS the timestamp of the commit and the database subspace prefix
> >
> > 2) `GET /{db}/_deleted_dbs_info`
> >
> > Return the timestamp (and whatever other info one should record) of deleted databases.
> >
> > 3) `PUT /{db}/_restore/{deletedTS}`
> >
> > Invoke `fdbrestore -k` to do a key range restricted restore into the current cluster
of the deleted subspace prefix at versionstamp-1.  Wait for it to complete, and return 200
when completed.
> >
> > And this would all rely on having a continuous backup configured and running that
would hold a minimum of 48 hours of changes.
> >
> >
> > Now, I don’t actually deal with backups often so my memory on current caveats
is a bit fuzzy.  I think there might be a couple complications here that I’ve missed, like…
> > * There not being key range restricted locking of the database
> > * A key range restore is currently suboptimal in that it doesn’t do obvious filtering
that it could to cut down on the amount of data it reads
> >
> > But, neither of these seem heavily blocking, as they could be tackled quickly, particularly
if you leverage some upstream relationships ;).  Backup and restore has been the general answer
to accidental data deletion (or corruption) on FDB, and I could paint some attractive looking
pros of this approach: backup files are more disk space efficient, soft deleted data could
be offloaded to an S3-compatible store, it would be free if FDB is already configured to take
backups.  I was just curious to hear a bit more detail on your/Peng’s side of the reasons
for preferring to build soft deletion on top of FDB (and thus have also intentionally withheld
more of the cons of this approach, or the pros of yours).
> >
> > > On Mar 18, 2020, at 11:59, Paul Davis <paul.joseph.davis@gmail.com> wrote:
> > >
> > > Alex,
> > >
> > > All joking aside, soft-deletion's target use case is accidental
> > > deletions. This isn't a replacement for backup/restore which will
> > > still happen for all the usual reasons.
> > >
> > > Paul
> > >
> > > On Wed, Mar 18, 2020 at 1:42 PM Paul Davis <paul.joseph.davis@gmail.com>
wrote:
> > >>
> > >> On Wed, Mar 18, 2020 at 1:29 PM Alex Miller
> > >> <alexmiller@apple.com.invalid> wrote:
> > >>>
> > >>>
> > >>>> On Mar 18, 2020, at 05:04, jiangph <jiangpenghui@hotmail.com>
wrote:
> > >>>>
> > >>>> Instead of automatically and immediately removing data and index
in database after a delete operation, soft-deletion allows to restore the deleted data back
to original state due to a “fat finger”or undesired delete operation, up to defined periods,
such as 48 hours.
> > >>>>
> > >>>> In CouchDB 3.0, soft-deletion of database is implemented in [1].
The .couch file is renamed with the .<timestamp>.deleted.couch file after soft-deletion
is enabled, and such file can be changed back to .couch for the purpose of restore. If restore
is not needed and some specified period passed, the .<timestamp>.deleted.couch file
can be deleted to achieve deletion of database permanently.
> > >>>>
> > >>>> In CouchDB 4.0, with the introduction of FoundationDB, the data
model and storage is changed. In order to support soft-deletion, we propose below solution
and then implement them.
> > >>>
> > >>>
> > >>>
> > >>> I’ve sort of hand waved some answers to this in my head, but would
you mind expanding a bit on the advantages of keeping soft-deleted data in FoundationDB as
opposed to actually deleting it and relying on FoundationDB’s backup and restore to recover
it if needed?
> > >>
> > >> From: Panicked User
> > >> To: Customer Support
> > >> Subject: URGENT! EMERGENCY DATABASE RESTORE!
> > >>
> > >> Dear,
> > >>
> > >> I have accidentally deleted my Very Important Database and need to
> > >> have it restored ASAP! Without this mission critical database my
> > >> company is completely offline which is costing $1B an hour!!!!!
> > >>
> > >> Please respond ASAP!
> > >>
> > >> Sincerely,
> > >> Panicky McPanics
> >

Mime
View raw message