couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Barlow <stemai...@gmail.com>
Subject Re: Purging documents and view invalidation
Date Mon, 08 Jul 2013 13:58:26 GMT
I would certainly say, "yes". This thread has certainly given me food
for thought about my design, given the current behaviour of CouchDB
replication!

On 08/07/2013, at 11:53 PM, Paul Hirst <paul.hirst@sophos.com> wrote:

> Actually the third option is the most compelling argument.
>
> * Replicate with target validate_doc_update that rejects "foo";
>
> If you changed the validate_doc_update function so it accepts "foo", would you expect
"foo" to now arrive when you replicate again?
>
> I say no, because it feels more useful that way (and unless I'm really confused that's
the current behaviour), but I can image other people might say yes.
>
> -----Original Message-----
> From: Jason Smith [mailto:jason.h.smith@gmail.com]
> Sent: 08 July 2013 08:46
> To: user@couchdb.apache.org
> Subject: Re: Purging documents and view invalidation
>
> I think the "official" behavior of purging and replicating is undefined.
>
> It is still "as if the document had never existed" because these three procedures are
equivalent (modulo my own misunderstanding of the question or CouchDB's code)
>
> * Replicate all docs; purge doc "foo"; replicate again
> * Replicate with a source filter that blocks doc "foo"; replicate again
> * Replicate with target validate_doc_update that rejects "foo"; replicate again
>
> Paul, when I think about it this way, it is not as fragile as I first thought. Replicating
with a filter is pretty well-understood, and a post-replication purge should behave the same
way.
>
>
>
> On Mon, Jul 8, 2013 at 2:38 PM, Steven Barlow <stemail23@gmail.com> wrote:
>
>> OK, the thing is, I have been having some issues when I want to
>> re-replicated documents that have been previously purged. Thinking
>> about this some more, and reading some of the below thread, I suspect
>> that the replication probably is failing to always replicate documents
>> that have been purged, due to some stored sequence number. My
>> suspicion is, therefore, that at least as far as replication is
>> concerned, purging a document is not "as if the document had never
>> existed".
>>
>> I'm tempted to suggest that this is a bug with purging and replication?
>>
>>
>> On 08/07/2013, at 5:18 PM, Jason Smith <jason.h.smith@gmail.com> wrote:
>>
>>> Paul, I believe you are correct on both counts: It would not
>>> re-replicate but IMO it is a fragile thing to depend on.
>>>
>>> The database has a purge_seq value which tracks the number of
>>> purges. I
>> do
>>> not recall if that is factored into the replication ID. It should
>>> be. If the target database has undergone a purge since you last met
>>> you have no idea what its state is. Note, the database name is
>>> relevant to the replication id, so simply copying foo.couch to
>>> bar.couch would trigger re-replicating the purged documents.
>>>
>>> To me, purging is as if a document had never existed. A replication
>> should
>>> recreate it (unless you change your filter policy or
>> validate_doc_update).
>>> This seems to be what Steven is doing so I think he is using it
>> correctly.
>>> CouchDB purge is like Git rebase: sure it is dangerous, but that's
>> because
>>> it is powerful; and sometimes power users need power tools.
>>>
>>>
>>> On Mon, Jul 8, 2013 at 2:04 PM, Paul Hirst <paul.hirst@sophos.com>
>> wrote:
>>>
>>>> Wouldn't the _local document which tracks replication prevent that?
>>>> Provided the source and destinations databases don't change URL it
>> should
>>>> pick up where it left off every time, and therefore never consider
>>>> the documents due for consideration unless they change. Are you
>>>> suggesting
>> that
>>>> it's rather fragile to rely on that?
>>>>
>>>> -----Original Message-----
>>>> From: Jason Smith [mailto:jason.h.smith@gmail.com]
>>>> Sent: 05 July 2013 16:23
>>>> To: user@couchdb.apache.org
>>>> Subject: Re: Purging documents and view invalidation
>>>>
>>>> If you do that, and you re-run replication (or potentially if you
>>>> use continuous replication) then those documents will be
>>>> re-replicated back
>> to
>>>> the remote site. Purging is as if the document was never created at
>> all. So
>>>> when replication runs, the couches will want to copy it from the
>> "master"
>>>> source.
>>>>
>>>>
>>>> On Fri, Jul 5, 2013 at 8:12 PM, Steven Barlow <stemail23@gmail.com>
>> wrote:
>>>>
>>>>> Purged at the remote site. The master always contains the complete
>>>>> data set, the remote sites replicate partial data sets for their
>>>>> immediate needs, and then clean themselves up once the tasks are
>>>>> complete.
>>>>>
>>>>> On 05/07/2013, at 9:57 PM, Jason Smith <jason.h.smith@gmail.com>
>> wrote:
>>>>>
>>>>>> On which database will you perform the purging?
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 5, 2013 at 6:52 PM, Steven Barlow
>>>>>> <stemail23@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>>> Sorry if this is a tangent, but I wanted to pick up on the
>>>>>>> "rarely used in the wild" thread: I personally intend to use
>>>>>>> purge, because I have temporary partial (filtered) replications
of a "master"
>>>>>>> database at remote sites. When the data has been consumed by
the
>>>>>>> remote site, I figured I could purge it (to save space). Is this
>>>>>>> not a valid, or common use case for purging?
>>>>>>>
>>>>>>> On 05/07/2013, at 7:21 PM, Jason Smith <jason.h.smith@gmail.com>
>>>> wrote:
>>>>>>>
>>>>>>>> I slightly disagree with Bob, but he is right that all purge
>>>>>>>> buys you
>>>>>>> (vs.
>>>>>>>> filtered replication and then swapping DBs) is a little bit
of
>>>> uptime.
>>>>>>>> Purge is not "untested" but it is rarely used in the wild,
so
>>>>>>>> the cost/benefit for your uptime is something between "risky"
>>>>>>>> and
>>>>> "unknown."
>>>>>>>>
>>>>>>>> (For me, personally, I would purge.)
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jul 5, 2013 at 3:31 PM, Robert Newson
>>>>>>>> <rnewson@apache.org>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Paul,
>>>>>>>>>
>>>>>>>>> If you replicate this database to another database and
use a
>>>>>>>>> filter that blocks deleted documents, the target will
not
>>>>>>>>> contain a trace of your 100 million deletes (that is,
you can
>>>>>>>>> build a new database without cruft without messing with
your
>>>>>>>>> existing database). During the replication, you can query
the
>>>>>>>>> view on the target to build it incrementally, or wait
till the
>>>>>>>>> end, query it once and wait for completion. At the end,
flip
>>>>>>>>> your app to look at the new database instead.
>>>>>>>>>
>>>>>>>>> The _purge feature is really only for the case where
you
>>>>>>>>> accidentally write your root password down in a document
id or
>>>>>>>>> something (since compaction will sweep away old document
>>>>>>>>> contents). I advise against using it for any other reason.
>>>>>>>>>
>>>>>>>>> B.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 5 July 2013 09:17, Jason Smith <jhs@apache.org>
wrote:
>>>>>>>>>> Hi, Paul. I wrote up some thoughts on purging here:
>>>>>>>>>> https://github.com/iriscouch/cqs#purging-couchdb
>>>>>>>>>>
>>>>>>>>>> Note, that procedure is untested. It works as a thought
>>>>>>>>>> experiment
>>>>>>> only.
>>>>>>>>>>
>>>>>>>>>> The procedure looks complicated, but all you will
need is the
>>>>>>>>>> core
>>>>>>> purge,
>>>>>>>>>> view, purge, view, etc. cadence as described in Damien's
>>>>>>>>>> email I
>>>>> linked
>>>>>>>>> to.
>>>>>>>>>> As long as you never purge twice before hitting the
view, you
>>>>>>>>>> are
>>>>> fine.
>>>>>>>>>> Again, to my knowledge, the purge code is less well
tested
>>>>>>>>>> than other
>>>>>>>>> parts
>>>>>>>>>> of CouchDB, so perhaps copy your .couch file and
try with
>>>>>>>>>> that until
>>>>>>> you
>>>>>>>>>> are confident.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jul 5, 2013 at 2:37 PM, Paul Hirst
>>>>>>>>>> <paul.hirst@sophos.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I would like to purge a few (~100 million) documents
from my
>>>>> database.
>>>>>>>>>>> I've been going through deleting them all, and
that'll be
>>>>>>>>>>> complete
>>>>> in
>>>>>>>>> the
>>>>>>>>>>> next few days but I would like to free up some
extra space
>>>>>>>>>>> by
>>>>> purging
>>>>>>>>> them
>>>>>>>>>>> also.
>>>>>>>>>>>
>>>>>>>>>>> My concern is around a comment on the wiki page
here
>>>>>>>>>>> http://wiki.apache.org/couchdb/Purge_Documents
>>>>>>>>>>>
>>>>>>>>>>> 'If you have purged more than one document between
querying
>>>>>>>>>>> your
>>>>>>> views,
>>>>>>>>>>> you will find that they will rebuild from scratch.'
>>>>>>>>>>>
>>>>>>>>>>> Since I have already deleted the documents I
know they
>>>>>>>>>>> aren't
>>>>> showing
>>>>>>> up
>>>>>>>>>>> in the view any longer. Is there any way I can
avoid this
>>>>>>>>>>> view invalidation? (My views take about 10 days
to build
>>>>>>>>>>> from scratch so
>>>>> I
>>>>>>>>> can't
>>>>>>>>>>> afford the hit).
>>>>>>>>>>>
>>>>>>>>>>> I have a replica of the database. I could do
the purge on
>>>>>>>>>>> the
>>>>> replica,
>>>>>>>>>>> wait for the view to rebuild, switch over, purge
on the
>>>>>>>>>>> original db,
>>>>>>>>> wait
>>>>>>>>>>> for the view, switch back, unless there are any
obvious
>>>>>>>>>>> problems
>>>>> with
>>>>>>>>> this
>>>>>>>>>>> approach?
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Paul
>>>>>>>>>>>
>>>>>>>>>>> ________________________________
>>>>>>>>>>>
>>>>>>>>>>> Sophos Limited, The Pentagon, Abingdon Science
Park,
>>>>>>>>>>> Abingdon,
>>>>>>>>>>> OX14
>>>>>>> 3YP,
>>>>>>>>>>> United Kingdom.
>>>>>>>>>>> Company Reg No 2096520. VAT Reg No GB 991 2418
08.
>>>>
>>>> ________________________________
>>>>
>>>> Sophos Limited, The Pentagon, Abingdon Science Park, Abingdon, OX14
>>>> 3YP, United Kingdom.
>>>> Company Reg No 2096520. VAT Reg No GB 991 2418 08.
>
> ________________________________
>
> Sophos Limited, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
> Company Reg No 2096520. VAT Reg No GB 991 2418 08.

Mime
View raw message