Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@couchdb.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
From: Frank Wunderlich <frank.wunderlich@kreuzwerker.de>
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_C0FFF504-CD81-430E-9C65-B0569681F8E1"
Subject: Selective Replication
Date: Wed, 26 Sep 2012 17:34:50 +0200
Message-Id: <A3E6B31A-93AE-4917-B5B1-46161E4F908C@kreuzwerker.de>
To: user@couchdb.apache.org
Mime-Version: 1.0 (Apple Message framework v1280)

--Apple-Mail=_C0FFF504-CD81-430E-9C65-B0569681F8E1
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=iso-8859-1

Hi *,

I am currently trying to figure out, how one could realize something =
like "selective replication" in CouchDB.

In our scenario we have got around 10 physically distributed CouchDB =
instances running.
There will probably be more than 1 million documents in out "master" =
instance.
Only a subset of those documents shall be replicated to each of the =
"slave" instances.
Users shall be able to explicitly control, which documents shall be =
synchronized to which destination.

So far I stumbled over the following 2 concepts:
1. Filtered Replication
2. Named Document Replication

On the first glance, replication filters seemed to be the way to go.
But unfortunately we have got a quite "relational" document model.
One logical "asset" consists of several CouchDB documents, referencing =
each other.

The filter functions can only access data, that is part of the document =
that is passed in as parameter.=20
Because of this limitation, each partial document must contain all the =
information necessary, to determine whether it shall be replicated or =
not.

This leads to redundancy and to potential inconsistencies if a =
"transaction" fails. Inconsistent asset aggregates might get "partially" =
transferred to other CouchDB instances.
And in my eyes, it will be hard to recognize and track down the cause of =
such inconsistencies.

Furthermore our content documents get "polluted" by pure technical =
attributes.


That's why we took a look at the second option: Named Document =
Replication.

It seemed to be good idea, to separate the two concerns of persistence =
and synchronization.
First we would like to persist any "logical asset" in our local CouchDB.
When we know that this step succeeded and all partial documents got =
stored in the database, then we would "register" the "logical asset" for =
synchronization.
This step would happen on the application layer, that is built on top of =
our CouchDB.

The registration process would look up all partial documents that make =
up the "logical asset".
Then any running replication job would get canceled (assuming we are =
using continues replication).
Finally we would restart those replication jobs by adding the =
indentified document_ids to the json that gets posted to the replicate =
URL.

The first attempts seemed promising.
But when experimenting with larger sets of documents, we noticed a =
significant performance degradation during replication.
With 100.000 documents to be replicated, the "Named Document =
Replication" was 4 times slower than the complete and unconditional =
replication of the whole database.
With 200.000 documents, the selective approach was even 7 times slower.
With 1.000.000 documents, the factor was > 20

So this approach is not scaling well...

What are your thoughts about this?
Is there anyone who has faced similar architectural questions?=20

Any hint will be appreciated.
Best regards,
Frank


--
kreuzwerker GmbH - we touch running systems
fon  +49 177 8780280  | fax +49 30  6098388-99=20
Ritterstra=DFe 12-14, 10969 Berlin | frank.wunderlich@kreuzwerker.de
HR B 129427 | Amtsgericht Charlottenburg  |  Gesch=E4ftsf=FChrer: =
Tilmann Eing =20


--Apple-Mail=_C0FFF504-CD81-430E-9C65-B0569681F8E1--