incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Hirst <paul.hi...@sophos.com>
Subject RE: Purging documents and view invalidation
Date Mon, 08 Jul 2013 13:52:54 GMT
Actually the third option is the most compelling argument.

* Replicate with target validate_doc_update that rejects "foo";

If you changed the validate_doc_update function so it accepts "foo", would you expect "foo"
to now arrive when you replicate again?

I say no, because it feels more useful that way (and unless I'm really confused that's the
current behaviour), but I can image other people might say yes.

-----Original Message-----
From: Jason Smith [mailto:jason.h.smith@gmail.com]
Sent: 08 July 2013 08:46
To: user@couchdb.apache.org
Subject: Re: Purging documents and view invalidation

I think the "official" behavior of purging and replicating is undefined.

It is still "as if the document had never existed" because these three procedures are equivalent
(modulo my own misunderstanding of the question or CouchDB's code)

* Replicate all docs; purge doc "foo"; replicate again
* Replicate with a source filter that blocks doc "foo"; replicate again
* Replicate with target validate_doc_update that rejects "foo"; replicate again

Paul, when I think about it this way, it is not as fragile as I first thought. Replicating
with a filter is pretty well-understood, and a post-replication purge should behave the same
way.



On Mon, Jul 8, 2013 at 2:38 PM, Steven Barlow <stemail23@gmail.com> wrote:

> OK, the thing is, I have been having some issues when I want to
> re-replicated documents that have been previously purged. Thinking
> about this some more, and reading some of the below thread, I suspect
> that the replication probably is failing to always replicate documents
> that have been purged, due to some stored sequence number. My
> suspicion is, therefore, that at least as far as replication is
> concerned, purging a document is not "as if the document had never
> existed".
>
> I'm tempted to suggest that this is a bug with purging and replication?
>
>
> On 08/07/2013, at 5:18 PM, Jason Smith <jason.h.smith@gmail.com> wrote:
>
> > Paul, I believe you are correct on both counts: It would not
> > re-replicate but IMO it is a fragile thing to depend on.
> >
> > The database has a purge_seq value which tracks the number of
> > purges. I
> do
> > not recall if that is factored into the replication ID. It should
> > be. If the target database has undergone a purge since you last met
> > you have no idea what its state is. Note, the database name is
> > relevant to the replication id, so simply copying foo.couch to
> > bar.couch would trigger re-replicating the purged documents.
> >
> > To me, purging is as if a document had never existed. A replication
> should
> > recreate it (unless you change your filter policy or
> validate_doc_update).
> > This seems to be what Steven is doing so I think he is using it
> correctly.
> > CouchDB purge is like Git rebase: sure it is dangerous, but that's
> because
> > it is powerful; and sometimes power users need power tools.
> >
> >
> > On Mon, Jul 8, 2013 at 2:04 PM, Paul Hirst <paul.hirst@sophos.com>
> wrote:
> >
> >> Wouldn't the _local document which tracks replication prevent that?
> >> Provided the source and destinations databases don't change URL it
> should
> >> pick up where it left off every time, and therefore never consider
> >> the documents due for consideration unless they change. Are you
> >> suggesting
> that
> >> it's rather fragile to rely on that?
> >>
> >> -----Original Message-----
> >> From: Jason Smith [mailto:jason.h.smith@gmail.com]
> >> Sent: 05 July 2013 16:23
> >> To: user@couchdb.apache.org
> >> Subject: Re: Purging documents and view invalidation
> >>
> >> If you do that, and you re-run replication (or potentially if you
> >> use continuous replication) then those documents will be
> >> re-replicated back
> to
> >> the remote site. Purging is as if the document was never created at
> all. So
> >> when replication runs, the couches will want to copy it from the
> "master"
> >> source.
> >>
> >>
> >> On Fri, Jul 5, 2013 at 8:12 PM, Steven Barlow <stemail23@gmail.com>
> wrote:
> >>
> >>> Purged at the remote site. The master always contains the complete
> >>> data set, the remote sites replicate partial data sets for their
> >>> immediate needs, and then clean themselves up once the tasks are
> >>> complete.
> >>>
> >>> On 05/07/2013, at 9:57 PM, Jason Smith <jason.h.smith@gmail.com>
> wrote:
> >>>
> >>>> On which database will you perform the purging?
> >>>>
> >>>>
> >>>> On Fri, Jul 5, 2013 at 6:52 PM, Steven Barlow
> >>>> <stemail23@gmail.com>
> >>> wrote:
> >>>>
> >>>>> Sorry if this is a tangent, but I wanted to pick up on the
> >>>>> "rarely used in the wild" thread: I personally intend to use
> >>>>> purge, because I have temporary partial (filtered) replications
of a "master"
> >>>>> database at remote sites. When the data has been consumed by the
> >>>>> remote site, I figured I could purge it (to save space). Is this
> >>>>> not a valid, or common use case for purging?
> >>>>>
> >>>>> On 05/07/2013, at 7:21 PM, Jason Smith <jason.h.smith@gmail.com>
> >> wrote:
> >>>>>
> >>>>>> I slightly disagree with Bob, but he is right that all purge
> >>>>>> buys you
> >>>>> (vs.
> >>>>>> filtered replication and then swapping DBs) is a little bit
of
> >> uptime.
> >>>>>> Purge is not "untested" but it is rarely used in the wild, so
> >>>>>> the cost/benefit for your uptime is something between "risky"
> >>>>>> and
> >>> "unknown."
> >>>>>>
> >>>>>> (For me, personally, I would purge.)
> >>>>>>
> >>>>>>
> >>>>>> On Fri, Jul 5, 2013 at 3:31 PM, Robert Newson
> >>>>>> <rnewson@apache.org>
> >>>>> wrote:
> >>>>>>
> >>>>>>> Paul,
> >>>>>>>
> >>>>>>> If you replicate this database to another database and use
a
> >>>>>>> filter that blocks deleted documents, the target will not
> >>>>>>> contain a trace of your 100 million deletes (that is, you
can
> >>>>>>> build a new database without cruft without messing with
your
> >>>>>>> existing database). During the replication, you can query
the
> >>>>>>> view on the target to build it incrementally, or wait till
the
> >>>>>>> end, query it once and wait for completion. At the end,
flip
> >>>>>>> your app to look at the new database instead.
> >>>>>>>
> >>>>>>> The _purge feature is really only for the case where you
> >>>>>>> accidentally write your root password down in a document
id or
> >>>>>>> something (since compaction will sweep away old document
> >>>>>>> contents). I advise against using it for any other reason.
> >>>>>>>
> >>>>>>> B.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 5 July 2013 09:17, Jason Smith <jhs@apache.org>
wrote:
> >>>>>>>> Hi, Paul. I wrote up some thoughts on purging here:
> >>>>>>>> https://github.com/iriscouch/cqs#purging-couchdb
> >>>>>>>>
> >>>>>>>> Note, that procedure is untested. It works as a thought
> >>>>>>>> experiment
> >>>>> only.
> >>>>>>>>
> >>>>>>>> The procedure looks complicated, but all you will need
is the
> >>>>>>>> core
> >>>>> purge,
> >>>>>>>> view, purge, view, etc. cadence as described in Damien's
> >>>>>>>> email I
> >>> linked
> >>>>>>> to.
> >>>>>>>> As long as you never purge twice before hitting the
view, you
> >>>>>>>> are
> >>> fine.
> >>>>>>>> Again, to my knowledge, the purge code is less well
tested
> >>>>>>>> than other
> >>>>>>> parts
> >>>>>>>> of CouchDB, so perhaps copy your .couch file and try
with
> >>>>>>>> that until
> >>>>> you
> >>>>>>>> are confident.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Fri, Jul 5, 2013 at 2:37 PM, Paul Hirst
> >>>>>>>> <paul.hirst@sophos.com>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> I would like to purge a few (~100 million) documents
from my
> >>> database.
> >>>>>>>>> I've been going through deleting them all, and that'll
be
> >>>>>>>>> complete
> >>> in
> >>>>>>> the
> >>>>>>>>> next few days but I would like to free up some extra
space
> >>>>>>>>> by
> >>> purging
> >>>>>>> them
> >>>>>>>>> also.
> >>>>>>>>>
> >>>>>>>>> My concern is around a comment on the wiki page
here
> >>>>>>>>> http://wiki.apache.org/couchdb/Purge_Documents
> >>>>>>>>>
> >>>>>>>>> 'If you have purged more than one document between
querying
> >>>>>>>>> your
> >>>>> views,
> >>>>>>>>> you will find that they will rebuild from scratch.'
> >>>>>>>>>
> >>>>>>>>> Since I have already deleted the documents I know
they
> >>>>>>>>> aren't
> >>> showing
> >>>>> up
> >>>>>>>>> in the view any longer. Is there any way I can avoid
this
> >>>>>>>>> view invalidation? (My views take about 10 days
to build
> >>>>>>>>> from scratch so
> >>> I
> >>>>>>> can't
> >>>>>>>>> afford the hit).
> >>>>>>>>>
> >>>>>>>>> I have a replica of the database. I could do the
purge on
> >>>>>>>>> the
> >>> replica,
> >>>>>>>>> wait for the view to rebuild, switch over, purge
on the
> >>>>>>>>> original db,
> >>>>>>> wait
> >>>>>>>>> for the view, switch back, unless there are any
obvious
> >>>>>>>>> problems
> >>> with
> >>>>>>> this
> >>>>>>>>> approach?
> >>>>>>>>>
> >>>>>>>>> Cheers,
> >>>>>>>>> Paul
> >>>>>>>>>
> >>>>>>>>> ________________________________
> >>>>>>>>>
> >>>>>>>>> Sophos Limited, The Pentagon, Abingdon Science Park,
> >>>>>>>>> Abingdon,
> >>>>>>>>> OX14
> >>>>> 3YP,
> >>>>>>>>> United Kingdom.
> >>>>>>>>> Company Reg No 2096520. VAT Reg No GB 991 2418 08.
> >>
> >> ________________________________
> >>
> >> Sophos Limited, The Pentagon, Abingdon Science Park, Abingdon, OX14
> >> 3YP, United Kingdom.
> >> Company Reg No 2096520. VAT Reg No GB 991 2418 08.
> >>
>

________________________________

Sophos Limited, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 991 2418 08.
Mime
View raw message