couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Filipe David Manana <fdman...@gmail.com>
Subject Re: _replicator DB
Date Tue, 25 May 2010 09:50:04 GMT
Hi all,

I've reworked on some implementation details. Namely, the replication
gen_servers now have an ID that is no longer based on the replication
document ID but instead in the md5 of the replication properties (source,
target, etc, like it is done currently when we post to _replicate). This
avoids having identical replications going on at the expense of a bit more
complex code.

>From the user point of view, everything is pretty much the same as I
announced before. The only few differences are:

- when a replication is started by adding a replication document to the
_replicator DB, the replicator besides adding the field "state" with value
"triggered" to the replication document, also adds the field
"replication_id". (with this field's value, we can access the replication
log/checkpoint documents, as Adam suggested before).

- if the user adds a second document that in fact describes a replication
already triggered by a previous document  (same source, target, etc), this
second document will not get a "state" field added to it. However the
replicator adds the "replication_id" field to it. This is nice IMO, since we
can add a view whose keys are the "replication_id" values
and see which replication documents are duplicates.

- deleting a duplicated replication document (a document that didn't
triggered a replication, since a former one already triggered that
replication) doesn't stop the replication. To stop it, we have to delete the
document that triggered the replication - we can find it by searching for a
document with the same "replication_id" and "state" set to "triggered".

For more details, check the JavaScript test suite:
http://github.com/fdmanana/couchdb/blob/new_replicator_db/share/www/script/test/replicator_db.js
It's maybe easier to understand _replicator DB by looking at the tests. It's
very simple from a user's point of view.

The whole patch can be found in a new branch at:
http://github.com/fdmanana/couchdb/compare/new_replicator_db

Later on I'll add a patch to a Jira ticket.

cheers



On Wed, May 19, 2010 at 10:31 AM, Filipe David Manana <fdmanana@gmail.com>wrote:

> Dear all,
>
> I've been working on the _replicator DB along with Chris. Some of you have
> already heard about this DB in the mailing list, IRC, or whatever. Its
> purpose:
>
> - replications can be started by adding a replication document to the
> replicator DB _replicator (its name can be configured in the .ini files)
>
> - replication documents are basically the same JSON structures that we
> currently use when POSTing to _replicate/  (and we can give them an
> arbitrary id)
>
> - to cancel a replication, we simply delete the replication document
>
> - after the replication is started, the replicator adds the field "state"
> to the replication document with value "triggered"
>
> - when the replication finishes (for non continuous replications), the
> replication sets the doc's "state" field to "completed"
>
> - if an error occurs during a replication, the corresponding replication
> document will have the "state" field set to "error"
>
> - after detecting that an error was found, the replication is restarted
> after some time (10s for now, but maybe it should be configurable)
>
> - after a server restart/crash, CouchDB will remember replications and will
> restart them (this is specially useful for continuous replications)
>
> - in the replication document we can define a "user_ctx" property, which
> defines the user name and/or role(s) under which the replication will
> execute
>
>
>
> Some restrictions regarding the _replicator DB:
>
> - only server admins can add and delete replication documents
>
> - only the replicator itself can update replication documents - this is to
> avoid having race conditions between the replicator and server admins trying
> to update replication documents
>
> - the above point implies that to change a replication you have to add a
> new replication document
>
> All this restrictions are in replicator DB design doc -
> http://github.com/fdmanana/couchdb/blob/replicator_db/src/couchdb/couch_def_js_funs.hrl<http://github.com/fdmanana/couchdb/blob/_replicator_db/src/couchdb/couch_def_js_funs.hrl>
>
>
> The code is fully working and is located at:
> http://github.com/fdmanana/couchdb/tree/replicator_db
>
> It includes a comprehensive JavaScript test case.
>
> Feel free to try it and give your feedback. There are still some TODOs as
> comments in the code, so it's still subject to changes.
>
>
> For people more involved with CouchDB internals and development:
>
> That branch breaks the stats.js test and, occasionally, the
> delayed_commits.js tests.
>
> It breaks stats.js because:
>
> - internally CouchDB uses the _changes API to be aware of the
> addition/update/deletion of replication documents to/from the _replicator
> DB. The _changes implementation constantly opens and closes the DB (opens
> are triggered by a gen_event). This affects the stats open_databases and
> open_os_files.
>
> It breaks delayed_commits.js  occasionally because:
>
> - by listening to _replicator DB changes an  extra file descriptor is used
> which affects the max_open_dbs config parameter. This parameter is related
> to the max number of user opened DBs. This causes the error {error,
> all_dbs_active} (from couch_server.erl) during the execution of
> delayed_commits.js (as well as stats.js).
>
> I also have another branch that fixes these issues in a "dirty" way:
> http://github.com/fdmanana/couchdb/tree/_replicator_db  (has a big comment
> in couch_server.erl explaining the hack)
>
> Basically it doesn't increment stats for the _replicator DB and bypasses
> the max_open_dbs when opening _replicator DB as well as doesn't allow it to
> be closed in favour of a user requested DB (like it assigned it a +infinite
> LRU time to this DB).
>
> Sometimes (although very rarely) I also get the all_dbs_active error when
> the authentication handlers are executing (because they open the _users DB).
> This is not originated by my _replicator DB code at all, since I get it with
> trunk as well.
>
> I would also like to collect feedback about what to do regarding this 2
> issues, specially max_open_dbs. Somehow I feel that no matter how many user
> DBs are open, it should always be possible to open the _replicator DB
> internally (and the _users DB).
>
>
> cheers
>
>
> --
> Filipe David Manana,
> fdmanana@gmail.com
>
> "Reasonable men adapt themselves to the world.
> Unreasonable men adapt the world to themselves.
> That's why all progress depends on unreasonable men."
>
>


-- 
Filipe David Manana,
fdmanana@gmail.com

"Reasonable men adapt themselves to the world.
Unreasonable men adapt the world to themselves.
That's why all progress depends on unreasonable men."

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message