couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Bryan <danbr...@gmail.com>
Subject Re: Database size seems off even after compaction runs.
Date Sun, 25 Dec 2011 00:10:23 GMT
I understand if this is necessary for eventual consistency, but shouldn't
this be better-documented? I generally expected that if I delete sensitive
or unwanted data, or that a user requests that their personal or private
data be deleted, it'll be deleted in a way that's more solid than basically
hiding it. Sure, CouchDB won't let you get at that document, but it's
certainly still there on the disk, and presumably detectable if you
inspected the data structure that holds individual documents. Not a very
good situation vis a vis security. I know that normal unix "deletion"
leaves files technically on disk, but there are ways to allow for that and
prevent it from being an issue.

Even setting data security aside, I've been using CouchDB as a kind of
staging environment for large amounts of data which should ultimately be
elsewhere (different flavours relational databases, databases belonging to
different organisations, etc.) because it's really easy to implement as an
interface and let people just throw whatever they want into it with a POST.
It's really the perfect tool for that, but pretty soon there'll be tens of
gigabytes a day of data flowing through the system, and most of it just
needs to be indexed for a while before our scheduled scripts pull it all
out, shove it elsewhere and delete it. In this use case, if I'm
understanding this correctly, we'll get crazy storage blowouts unless we
implement a bunch of hacks to switch to new databases after performing
deletions (as well as scripts that make our HTTP reverse proxy
transparently and intelligently route data to the new database - absolutely
not a trivial task in any complex system with many moving parts).

But you know, this all comes with the territory. If the devs say there's a
good reason for documents to stick around after deletion, I believe them,
but I think that's a pretty huge point and I don't know how I've missed it.

What's the way to delete a document if I actually want to really delete the
data? Changing it to a blank document before deleting, and then compacting?

On Sat, Dec 24, 2011 at 2:37 PM, Jens Alfke <jens@couchbase.com> wrote:

>
> On Dec 23, 2011, at 4:09 PM, Mark Hahn wrote:
>
> > 1) How exactly could you make this switch without interrupting service?
>
> Replicate database to new db, then atomically switch your proxy or
> whatever to the new db from the old one.
> Depending on how long the replication takes, there’s a race condition here
> where changes made to the old db during the replication won’t be propagated
> to the new one; you could either repeat the process incrementally until
> this doesn’t happen, or else put the db into read-only mode while you’re
> doing the copy.
>
> This might also be helpful: http://tinyurl.com/89lr3fl
>
> > 2) Wouldn't this procedure create the exact same eventual consistency
> > problems that deleting documents in a db would?
>
> No; what’s necessary is the revision tree, and the replication will
> preserve that. You’re just losing the contents of the deleted revisions
> that accidentally got left behind because of the weird way the documents
> were deleted.
>
> —Jens
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message