incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <rnew...@apache.org>
Subject Re: Database size seems off even after compaction runs.
Date Sun, 25 Dec 2011 10:20:49 GMT
Mark,

Using the DELETE method simply updates the document to

  {"_id":"foo","_rev":"newrev","_deleted":true}

If you did the same via PUT or POST, you'd get exactly the same effect
as DELETE.

Daniel,

You have a valid point, that this should be better documented. It is
unknown how many phantom documents are out there, those that were
deleted by adding _deleted:true on the assumption that this cleans out
the document. In fact, when I first noticed this effect I created a
JIRA ticket and applied a patch to fix it, before Damien pointed out
that this behavior is intentional (indeed, necessary).

To answer your final question, CouchDB preserves what you ask it to,
it does not alter the contents of documents itself. So, if you save
{"_id":"foo","_rev":"newrev","_deleted":true. "password to my bank
account":"foobar"}, it will do so. Use either the DELETE http method
or POST/PUT only the document you wish to be stored (minimum is, as
noted above, _id, _rev and _deleted).

B.


On 25 December 2011 00:40, Jens Alfke <jens@couchbase.com> wrote:
> No. If you delete a document properly (using DELETE, not just setting a _deleted property)
you won't have this problem. The old revision with the data will be gone after compaction,
leaving only an empty "tombstone".
>
> --Jens     [via iPhone]
>
> On Dec 24, 2011, at 4:10 PM, "Daniel Bryan" <danbryan@gmail.com> wrote:
>
>> I understand if this is necessary for eventual consistency, but shouldn't
>> this be better-documented? I generally expected that if I delete sensitive
>> or unwanted data, or that a user requests that their personal or private
>> data be deleted, it'll be deleted in a way that's more solid than basically
>> hiding it. Sure, CouchDB won't let you get at that document, but it's
>> certainly still there on the disk, and presumably detectable if you
>> inspected the data structure that holds individual documents. Not a very
>> good situation vis a vis security. I know that normal unix "deletion"
>> leaves files technically on disk, but there are ways to allow for that and
>> prevent it from being an issue.
>>
>> Even setting data security aside, I've been using CouchDB as a kind of
>> staging environment for large amounts of data which should ultimately be
>> elsewhere (different flavours relational databases, databases belonging to
>> different organisations, etc.) because it's really easy to implement as an
>> interface and let people just throw whatever they want into it with a POST.
>> It's really the perfect tool for that, but pretty soon there'll be tens of
>> gigabytes a day of data flowing through the system, and most of it just
>> needs to be indexed for a while before our scheduled scripts pull it all
>> out, shove it elsewhere and delete it. In this use case, if I'm
>> understanding this correctly, we'll get crazy storage blowouts unless we
>> implement a bunch of hacks to switch to new databases after performing
>> deletions (as well as scripts that make our HTTP reverse proxy
>> transparently and intelligently route data to the new database - absolutely
>> not a trivial task in any complex system with many moving parts).
>>
>> But you know, this all comes with the territory. If the devs say there's a
>> good reason for documents to stick around after deletion, I believe them,
>> but I think that's a pretty huge point and I don't know how I've missed it.
>>
>> What's the way to delete a document if I actually want to really delete the
>> data? Changing it to a blank document before deleting, and then compacting?
>>
>> On Sat, Dec 24, 2011 at 2:37 PM, Jens Alfke <jens@couchbase.com> wrote:
>>
>>>
>>> On Dec 23, 2011, at 4:09 PM, Mark Hahn wrote:
>>>
>>>> 1) How exactly could you make this switch without interrupting service?
>>>
>>> Replicate database to new db, then atomically switch your proxy or
>>> whatever to the new db from the old one.
>>> Depending on how long the replication takes, there’s a race condition here
>>> where changes made to the old db during the replication won’t be propagated
>>> to the new one; you could either repeat the process incrementally until
>>> this doesn’t happen, or else put the db into read-only mode while you’re
>>> doing the copy.
>>>
>>> This might also be helpful: http://tinyurl.com/89lr3fl
>>>
>>>> 2) Wouldn't this procedure create the exact same eventual consistency
>>>> problems that deleting documents in a db would?
>>>
>>> No; what’s necessary is the revision tree, and the replication will
>>> preserve that. You’re just losing the contents of the deleted revisions
>>> that accidentally got left behind because of the weird way the documents
>>> were deleted.
>>>
>>> —Jens
>>>
>>>

Mime
View raw message