Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6D66E7AD1 for ; Mon, 5 Dec 2011 19:37:20 +0000 (UTC) Received: (qmail 49000 invoked by uid 500); 5 Dec 2011 19:37:18 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 48966 invoked by uid 500); 5 Dec 2011 19:37:18 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 48958 invoked by uid 99); 5 Dec 2011 19:37:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Dec 2011 19:37:18 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [72.246.2.12] (HELO prod-mail-xrelay01.akamai.com) (72.246.2.12) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Dec 2011 19:37:12 +0000 Received: from prod-mail-xrelay01.akamai.com (localhost.localdomain [127.0.0.1]) by postfix.imss70 (Postfix) with ESMTP id 535F9CF077 for ; Mon, 5 Dec 2011 19:36:50 +0000 (GMT) Received: from prod-mail-relay02.akamai.com (prod-mail-relay02.akamai.com [172.17.50.21]) by prod-mail-xrelay01.akamai.com (Postfix) with ESMTP id 3DFD5CF076 for ; Mon, 5 Dec 2011 19:36:50 +0000 (GMT) Received: from ustx2ex-cashub.dfw01.corp.akamai.com (ustx2ex-cashub2.dfw01.corp.akamai.com [172.27.33.76]) by prod-mail-relay02.akamai.com (Postfix) with ESMTP id 082B6FE053 for ; Mon, 5 Dec 2011 19:36:50 +0000 (GMT) Received: from USMBX2.msg.corp.akamai.com ([169.254.2.51]) by ustx2ex-cashub2.dfw01.corp.akamai.com ([172.27.33.76]) with mapi; Mon, 5 Dec 2011 13:36:49 -0600 From: "Wong, Kai" To: "user@couchdb.apache.org" Importance: high X-Priority: 1 Date: Mon, 5 Dec 2011 13:36:46 -0600 Subject: Re: Having purge problems Thread-Topic: Having purge problems Thread-Index: AcyzhTsIrttBPGnkQMmhZaTysXMQxA== Message-ID: In-Reply-To: <6D864D5A-915E-4870-9656-9AB9B29EF0C2@dionne-associates.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.13.0.110805 acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org Thanks for the replies. I think allowing compaction to clean up deletions may break bigcouch in the following scenario: - 1 replica node is down. - Meanwhile, a document is deleted from the remaining (> a quorum) replicas. - Compaction is performed on each of these replicas. Now the existence of these document is wiped out from them. - However, now the down replica node comes back. And the supposedly deleted document is still alive there and is replicated by bigcouch back to the remaining replicas. So, I don't think automatic clean up upon compaction is a good idea. However, like in the original thread request, I also foresee that deleted documents will pile up and take up a lot of storage space in my use case. I plan to perform periodic purge to free up space. I intend to delete the document, wait long enough and actually ensure that all replicas have been deleted before purging each asynchronously. So, even if a down node comes back and the deleted document gets replicated back (very unlikely scenario), it is only a deleted document and not a live document as in the compaction case above. So, when will bigcouch 0.4 be out with the fix that individual purge to each shard replica will work? Right now, once that badarity error occurs, I can't purge from any shard or any node -- all get the same error! And there seems to be no way to remove this error unless I delete all the shard replica files. Thanks! Kai On 12/4/11 4:35 AM, "Robert Dionne" wrote: >The semantics of delete and purge are a little confusing. > >What purge does is delete, and delete doesn't delete at all, it merely >marks a revision as deleted. Sure this stems from the nature of MVCC but >I'm wondering if it wouldn't be better for compaction to clean up deleted >revisions. > >Telling the user that to really delete some bad data, the best way is to >change it and then compact isn't quite right. > >Moreover you can delete a doc, replicate the db to another, create the >doc anew on the target and then "open_revs=3Dall" will show version 1- as >the new one and version 2- as deleted. Of course these seq ids are not >supposed to be interpreted as ordered but it looks odd. > >I would imagine allowing compact to clean up deletions would break >something else but it seems like a clearer way to do things from the >user's perspective. > >Cheers, > >Bob > > >On Dec 3, 2011, at 10:41 PM, Jason Smith wrote: > >> On Sat, Dec 3, 2011 at 11:02 PM, Robert Newson >>wrote: >>> I can't mention _purge without reminding everyone that it exists only >>> for removal of data that should not have been stored in the first >>> place (like sensitive passwords, etc). It is not a mechanism to use >>> lightly as it breaks eventual consistency, is only lightly tested, and >>> will often cause full view rebuilds. >>=20 >> Hi, Bob. Since this is the user list, may I pull this thread into a >>tangent? >>=20 >> tl;dr =3D There is one exception, purging very old deleted docs; link to >> an awesome write-up at the end. >>=20 >> Where I agree: >>=20 >> * Purge is not a mechanism to use lightly >> * Purge breaks eventual consistency >> * Purge is only lightly tested >>=20 >> Where I sort-of agree: >>=20 >> * Purge will "often" cause full view rebuilds. This is true in the >> most general case, however it can be virtually eliminated in >> production by writing careful code (or using a carefully-written >> library). >>=20 >> Where I disagree: >>=20 >> * Purge exists only for removal of data which should not have been >> stored in the first place (like sensitive passwords) >>=20 >> Let's break this down. The easy part is what to do with sensitive >> passwords. Purge is not delete; purge removes a document from ever >> having existed in the first place. Like Marty McFly in "The Dating >> Game": http://www.youtube.com/watch?v=3DCC73uxAVfVY >>=20 >> Since purged documents cannot be replicated away, the best thing to do >> about a sensitive password is to *change it*, so the change can >> propagate. (Subsequent compaction on the Couch(es) will remove the >> password from the disk--or at least from the filesystem.) >>=20 >> And speaking of compaction, your "never purge" advice is good; except >> for deleted documents. Deleted documents never, ever, exit the .couch >> file. CouchDB is relaxed. I should be able to create and delete >> documents and expect reasonable post-compaction disk usage. If purge >> is off-limits, certain usage styles of CouchDB produce ever-growing >> .couch files, and ever-slowing _all_docs and _changes queries. >>=20 >> One might say, "just make a new database, with filtered replication." >> Well, that is basically exactly what a purge does: >>=20 >> * It is to be done only rarely, and with care >> * Some documents "never existed at all" >> * All views are rebuilt >>=20 >> For this reason, I have written a procedure for purging very old >> deleted documents in a production setting. I hope to actually >> implement it one day; but you are right that it is needed rarely, if >> at all, in the real world. >>=20 >> https://github.com/iriscouch/cqs#purging >>=20 >> --=20 >> Iris Couch >