couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <robert.new...@gmail.com>
Subject Re: Database size seems off even after compaction runs.
Date Sun, 25 Dec 2011 12:34:03 GMT
Yes, you can create a new doc where the deleted doc was.

Sent from my iPhone

On 25 Dec 2011, at 12:10, George Burt <imtrueshot@gmail.com> wrote:

> So, can I re-use the deleted document?  My _id is part of the data and has
> meaning.  If I delete the old _id, am I not allowed to have that same
> meaning again by reclaiming the _id?  _id="block_1_house_1"  then a
> hurricane and so we delete it.  Then we rebuild it (maybe) and so I need
> _id="block_1_house_1" again.
>
> George
>
>
> On Sun, Dec 25, 2011 at 5:20 AM, Robert Newson <rnewson@apache.org> wrote:
>
>> Mark,
>>
>> Using the DELETE method simply updates the document to
>>
>> {"_id":"foo","_rev":"newrev","_deleted":true}
>>
>> If you did the same via PUT or POST, you'd get exactly the same effect
>> as DELETE.
>>
>> Daniel,
>>
>> You have a valid point, that this should be better documented. It is
>> unknown how many phantom documents are out there, those that were
>> deleted by adding _deleted:true on the assumption that this cleans out
>> the document. In fact, when I first noticed this effect I created a
>> JIRA ticket and applied a patch to fix it, before Damien pointed out
>> that this behavior is intentional (indeed, necessary).
>>
>> To answer your final question, CouchDB preserves what you ask it to,
>> it does not alter the contents of documents itself. So, if you save
>> {"_id":"foo","_rev":"newrev","_deleted":true. "password to my bank
>> account":"foobar"}, it will do so. Use either the DELETE http method
>> or POST/PUT only the document you wish to be stored (minimum is, as
>> noted above, _id, _rev and _deleted).
>>
>> B.
>>
>>
>> On 25 December 2011 00:40, Jens Alfke <jens@couchbase.com> wrote:
>>> No. If you delete a document properly (using DELETE, not just setting a
>> _deleted property) you won't have this problem. The old revision with the
>> data will be gone after compaction, leaving only an empty "tombstone".
>>>
>>> --Jens     [via iPhone]
>>>
>>> On Dec 24, 2011, at 4:10 PM, "Daniel Bryan" <danbryan@gmail.com> wrote:
>>>
>>>> I understand if this is necessary for eventual consistency, but
>> shouldn't
>>>> this be better-documented? I generally expected that if I delete
>> sensitive
>>>> or unwanted data, or that a user requests that their personal or private
>>>> data be deleted, it'll be deleted in a way that's more solid than
>> basically
>>>> hiding it. Sure, CouchDB won't let you get at that document, but it's
>>>> certainly still there on the disk, and presumably detectable if you
>>>> inspected the data structure that holds individual documents. Not a very
>>>> good situation vis a vis security. I know that normal unix "deletion"
>>>> leaves files technically on disk, but there are ways to allow for that
>> and
>>>> prevent it from being an issue.
>>>>
>>>> Even setting data security aside, I've been using CouchDB as a kind of
>>>> staging environment for large amounts of data which should ultimately be
>>>> elsewhere (different flavours relational databases, databases belonging
>> to
>>>> different organisations, etc.) because it's really easy to implement as
>> an
>>>> interface and let people just throw whatever they want into it with a
>> POST.
>>>> It's really the perfect tool for that, but pretty soon there'll be tens
>> of
>>>> gigabytes a day of data flowing through the system, and most of it just
>>>> needs to be indexed for a while before our scheduled scripts pull it all
>>>> out, shove it elsewhere and delete it. In this use case, if I'm
>>>> understanding this correctly, we'll get crazy storage blowouts unless we
>>>> implement a bunch of hacks to switch to new databases after performing
>>>> deletions (as well as scripts that make our HTTP reverse proxy
>>>> transparently and intelligently route data to the new database -
>> absolutely
>>>> not a trivial task in any complex system with many moving parts).
>>>>
>>>> But you know, this all comes with the territory. If the devs say
>> there's a
>>>> good reason for documents to stick around after deletion, I believe
>> them,
>>>> but I think that's a pretty huge point and I don't know how I've missed
>> it.
>>>>
>>>> What's the way to delete a document if I actually want to really delete
>> the
>>>> data? Changing it to a blank document before deleting, and then
>> compacting?
>>>>
>>>> On Sat, Dec 24, 2011 at 2:37 PM, Jens Alfke <jens@couchbase.com> wrote:
>>>>
>>>>>
>>>>> On Dec 23, 2011, at 4:09 PM, Mark Hahn wrote:
>>>>>
>>>>>> 1) How exactly could you make this switch without interrupting
>> service?
>>>>>
>>>>> Replicate database to new db, then atomically switch your proxy or
>>>>> whatever to the new db from the old one.
>>>>> Depending on how long the replication takes, there’s a race condition
>> here
>>>>> where changes made to the old db during the replication won’t be
>> propagated
>>>>> to the new one; you could either repeat the process incrementally until
>>>>> this doesn’t happen, or else put the db into read-only mode while
>> you’re
>>>>> doing the copy.
>>>>>
>>>>> This might also be helpful: http://tinyurl.com/89lr3fl
>>>>>
>>>>>> 2) Wouldn't this procedure create the exact same eventual consistency
>>>>>> problems that deleting documents in a db would?
>>>>>
>>>>> No; what’s necessary is the revision tree, and the replication will
>>>>> preserve that. You’re just losing the contents of the deleted revisions
>>>>> that accidentally got left behind because of the weird way the
>> documents
>>>>> were deleted.
>>>>>
>>>>> —Jens
>>>>>
>>>>>
>>
>
>
>
> --
> George Burt
> President
> TrueShot Enterprises, LLC.
> (386) 208-1309
> Fax (213) 477-2195
> www.TrueShot.com
> 12756 92nd Ter
> Live Oak, FL 32060

Mime
View raw message