couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Riyad Kalla <rka...@gmail.com>
Subject Re: Anyway to alter doc when replicating?
Date Thu, 06 Feb 2014 19:04:23 GMT
Dan, I wonder if you would be better serviced by creating a View in your
original DB that does all the needed manipulation to the docs and code up
some form of manual replication where you take all the results from that
view and copy them into your target data source?

You wouldn't be able to use the built-in CouchDB replication, but at least
you would have total control over the data leaving your master source (it
sounds like in your case masking PII/sensitive data before it leaves is
important, so this step might be handy).


On Thu, Feb 6, 2014 at 11:06 AM, Jens Alfke <jens@couchbase.com> wrote:

>
> On Feb 6, 2014, at 9:38 AM, Dan Santner <dansantner@me.com> wrote:
>
> > I have the replication filtering down now but I'm wondering is there
> anyway for me to change the doc before it copies to the source?
>
> Well, to take your question literally, you can of course change the
> documents on the original database before starting the replication. Only
> the latest revisions (with the redacted names) will be transferred.
>
> But I think you're asking for some kind of filter that would alter
> documents while they're being replicated? I don't think that's feasible.
> The document's revision ID is tied to its contents (it's based on a SHA-1
> digest of the JSON) and you can't change the contents while leaving the
> revision ID the same. But changing the rev ID in the middle of replication
> would be really problematic because the replicator is transferring specific
> revisions by their revIDs, and it would confuse it if it got a different
> revID than the one it asked for.
>
> > The use case is I have production documents that I want to migrate
> somewhere else but change all the names to 'John Smith' before they land in
> the new destination.  Also need to remove a couple other things that might
> be considered sensitive.
>
> The only good option I can think of is to keep the sensitive parts of the
> data in separate documents. (The main doc would have a property that
> contains the doc ID of the sensitive data.) Then you can run a filtered
> replication that sends the regular documents but not the sensitive ones.
>
> --Jens

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message