couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Alfke <>
Subject Re: Database size seems off even after compaction runs.
Date Sun, 25 Dec 2011 00:40:33 GMT
No. If you delete a document properly (using DELETE, not just setting a _deleted property)
you won't have this problem. The old revision with the data will be gone after compaction,
leaving only an empty "tombstone".

--Jens     [via iPhone]

On Dec 24, 2011, at 4:10 PM, "Daniel Bryan" <> wrote:

> I understand if this is necessary for eventual consistency, but shouldn't
> this be better-documented? I generally expected that if I delete sensitive
> or unwanted data, or that a user requests that their personal or private
> data be deleted, it'll be deleted in a way that's more solid than basically
> hiding it. Sure, CouchDB won't let you get at that document, but it's
> certainly still there on the disk, and presumably detectable if you
> inspected the data structure that holds individual documents. Not a very
> good situation vis a vis security. I know that normal unix "deletion"
> leaves files technically on disk, but there are ways to allow for that and
> prevent it from being an issue.
> Even setting data security aside, I've been using CouchDB as a kind of
> staging environment for large amounts of data which should ultimately be
> elsewhere (different flavours relational databases, databases belonging to
> different organisations, etc.) because it's really easy to implement as an
> interface and let people just throw whatever they want into it with a POST.
> It's really the perfect tool for that, but pretty soon there'll be tens of
> gigabytes a day of data flowing through the system, and most of it just
> needs to be indexed for a while before our scheduled scripts pull it all
> out, shove it elsewhere and delete it. In this use case, if I'm
> understanding this correctly, we'll get crazy storage blowouts unless we
> implement a bunch of hacks to switch to new databases after performing
> deletions (as well as scripts that make our HTTP reverse proxy
> transparently and intelligently route data to the new database - absolutely
> not a trivial task in any complex system with many moving parts).
> But you know, this all comes with the territory. If the devs say there's a
> good reason for documents to stick around after deletion, I believe them,
> but I think that's a pretty huge point and I don't know how I've missed it.
> What's the way to delete a document if I actually want to really delete the
> data? Changing it to a blank document before deleting, and then compacting?
> On Sat, Dec 24, 2011 at 2:37 PM, Jens Alfke <> wrote:
>> On Dec 23, 2011, at 4:09 PM, Mark Hahn wrote:
>>> 1) How exactly could you make this switch without interrupting service?
>> Replicate database to new db, then atomically switch your proxy or
>> whatever to the new db from the old one.
>> Depending on how long the replication takes, there’s a race condition here
>> where changes made to the old db during the replication won’t be propagated
>> to the new one; you could either repeat the process incrementally until
>> this doesn’t happen, or else put the db into read-only mode while you’re
>> doing the copy.
>> This might also be helpful:
>>> 2) Wouldn't this procedure create the exact same eventual consistency
>>> problems that deleting documents in a db would?
>> No; what’s necessary is the revision tree, and the replication will
>> preserve that. You’re just losing the contents of the deleted revisions
>> that accidentally got left behind because of the weird way the documents
>> were deleted.
>> —Jens
View raw message