couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From George Burt <imtrues...@gmail.com>
Subject Re: Database size seems off even after compaction runs.
Date Sun, 25 Dec 2011 12:10:01 GMT
So, can I re-use the deleted document?  My _id is part of the data and has
meaning.  If I delete the old _id, am I not allowed to have that same
meaning again by reclaiming the _id?  _id="block_1_house_1"  then a
hurricane and so we delete it.  Then we rebuild it (maybe) and so I need
_id="block_1_house_1" again.

George


On Sun, Dec 25, 2011 at 5:20 AM, Robert Newson <rnewson@apache.org> wrote:

> Mark,
>
> Using the DELETE method simply updates the document to
>
>  {"_id":"foo","_rev":"newrev","_deleted":true}
>
> If you did the same via PUT or POST, you'd get exactly the same effect
> as DELETE.
>
> Daniel,
>
> You have a valid point, that this should be better documented. It is
> unknown how many phantom documents are out there, those that were
> deleted by adding _deleted:true on the assumption that this cleans out
> the document. In fact, when I first noticed this effect I created a
> JIRA ticket and applied a patch to fix it, before Damien pointed out
> that this behavior is intentional (indeed, necessary).
>
> To answer your final question, CouchDB preserves what you ask it to,
> it does not alter the contents of documents itself. So, if you save
> {"_id":"foo","_rev":"newrev","_deleted":true. "password to my bank
> account":"foobar"}, it will do so. Use either the DELETE http method
> or POST/PUT only the document you wish to be stored (minimum is, as
> noted above, _id, _rev and _deleted).
>
> B.
>
>
> On 25 December 2011 00:40, Jens Alfke <jens@couchbase.com> wrote:
> > No. If you delete a document properly (using DELETE, not just setting a
> _deleted property) you won't have this problem. The old revision with the
> data will be gone after compaction, leaving only an empty "tombstone".
> >
> > --Jens     [via iPhone]
> >
> > On Dec 24, 2011, at 4:10 PM, "Daniel Bryan" <danbryan@gmail.com> wrote:
> >
> >> I understand if this is necessary for eventual consistency, but
> shouldn't
> >> this be better-documented? I generally expected that if I delete
> sensitive
> >> or unwanted data, or that a user requests that their personal or private
> >> data be deleted, it'll be deleted in a way that's more solid than
> basically
> >> hiding it. Sure, CouchDB won't let you get at that document, but it's
> >> certainly still there on the disk, and presumably detectable if you
> >> inspected the data structure that holds individual documents. Not a very
> >> good situation vis a vis security. I know that normal unix "deletion"
> >> leaves files technically on disk, but there are ways to allow for that
> and
> >> prevent it from being an issue.
> >>
> >> Even setting data security aside, I've been using CouchDB as a kind of
> >> staging environment for large amounts of data which should ultimately be
> >> elsewhere (different flavours relational databases, databases belonging
> to
> >> different organisations, etc.) because it's really easy to implement as
> an
> >> interface and let people just throw whatever they want into it with a
> POST.
> >> It's really the perfect tool for that, but pretty soon there'll be tens
> of
> >> gigabytes a day of data flowing through the system, and most of it just
> >> needs to be indexed for a while before our scheduled scripts pull it all
> >> out, shove it elsewhere and delete it. In this use case, if I'm
> >> understanding this correctly, we'll get crazy storage blowouts unless we
> >> implement a bunch of hacks to switch to new databases after performing
> >> deletions (as well as scripts that make our HTTP reverse proxy
> >> transparently and intelligently route data to the new database -
> absolutely
> >> not a trivial task in any complex system with many moving parts).
> >>
> >> But you know, this all comes with the territory. If the devs say
> there's a
> >> good reason for documents to stick around after deletion, I believe
> them,
> >> but I think that's a pretty huge point and I don't know how I've missed
> it.
> >>
> >> What's the way to delete a document if I actually want to really delete
> the
> >> data? Changing it to a blank document before deleting, and then
> compacting?
> >>
> >> On Sat, Dec 24, 2011 at 2:37 PM, Jens Alfke <jens@couchbase.com> wrote:
> >>
> >>>
> >>> On Dec 23, 2011, at 4:09 PM, Mark Hahn wrote:
> >>>
> >>>> 1) How exactly could you make this switch without interrupting
> service?
> >>>
> >>> Replicate database to new db, then atomically switch your proxy or
> >>> whatever to the new db from the old one.
> >>> Depending on how long the replication takes, there’s a race condition
> here
> >>> where changes made to the old db during the replication won’t be
> propagated
> >>> to the new one; you could either repeat the process incrementally until
> >>> this doesn’t happen, or else put the db into read-only mode while
> you’re
> >>> doing the copy.
> >>>
> >>> This might also be helpful: http://tinyurl.com/89lr3fl
> >>>
> >>>> 2) Wouldn't this procedure create the exact same eventual consistency
> >>>> problems that deleting documents in a db would?
> >>>
> >>> No; what’s necessary is the revision tree, and the replication will
> >>> preserve that. You’re just losing the contents of the deleted revisions
> >>> that accidentally got left behind because of the weird way the
> documents
> >>> were deleted.
> >>>
> >>> —Jens
> >>>
> >>>
>



-- 
George Burt
President
TrueShot Enterprises, LLC.
(386) 208-1309
Fax (213) 477-2195
www.TrueShot.com
12756 92nd Ter
Live Oak, FL 32060

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message