couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wout Mertens <>
Subject Re: How to keep from sending more than one email from multiple replicated couchdb instances
Date Mon, 15 Nov 2010 06:50:17 GMT
I think you need to decouple the database from the replication. Replication management is not
a first-class citizen in CouchDB (yet?) and the problems you present show that.

Basically what you're looking at is a message board service, where clients post requests ("send
this email") and servers take requests and execute them. If you add a board monitor to the
mix, that one can be responsible for putting taken requests back on the board if the server
that took it isn't responding.

The CouchDB servers would host this message board database and a replication monitor makes
sure that all servers are up to date.

The monitors can be made resilient by having multiple, that communicate with heartbeats. There
is only one monitor master that does the rescheduling, warning etc and the others stand by
until it stops responding.

How does this model sound?

Note that the requests put on the board should be "transactional", in that the have to be
retry-able if their server fails. If need be, the request can probably be split up in smaller
parts but then you need an extra monitor that follows a recipe and posts these parts in execution


On 15 Nov 2010, at 02:01, Mike Fedyk <> wrote:

> node.js + CouchDB == Crazy Delicious by Mikeal Rogers
> I was watching this a couple days ago and I've been thinking about how
> to deal with instance and service (think of sending emails as a
> "service") failures.  Because it's easy to make sure that only one
> email is sent if you only have one server sending emails, but if that
> machine fails, then no emails get sent out.
> You compose an email while offline and save it to your local couch
> instance.  Then later it gets replicated to one of the couchdb
> instances in your cloud.  And then:
> 1. You have the date when it was saved on the phone, etc.  If you had
> a timestamp when that replication happened, you'd be able to have a
> chain of couchdb instances try to send the email, but only if it is
> older than X time after it was replicated to your cloud of couchdb
> instances.  instance_a would try immediately, instance_b tries if it
> hasn't been taken in X minutes, and so on for instance_c.  see [A].
> 2. When instance_a wants to send the email, it updates the state to
> "taking" and then waits for instance_b and instance_c to ack the
> taking by adding fields to the current document.  oops, instance_b and
> instance_c will race more often than not and you'll get a conflict so
> it needs to be separate temporary state tracking documents.  You still
> need [A] or if there are no other instances you'll wait forever for
> acks that won't happen.
> 3. You have one instance that sends emails and you deal with the
> downtime if that instance fails or some other failure happens that
> prevents email from being sent.
> 4. You send periodic test emails to make sure they are being sent, and
> if they are not then take over the function on instance_$self.  see
> [B]
> A) And this only works assuming that all of your cloud couchdb
> instances are replicating to each other correctly at the moment.  Now
> you have N > 1 emails sent out.  (and imagine if what's happening is
> something where it's more important than receiving an email or
> receiving more than one email)  To keep this from happening you need a
> couchdb instance heartbeat (maybe have an app update a document that
> describes that instances "registration" in the system with the current
> time stamp every 60 seconds) and a STONITH system to kill any
> instances of couchdb that stop updating their document.
> B) Do you still need [A]?  maybe it's good enough that the email
> didn't get back to you, but maybe it is sending emails to other
> places.  so it seems [A] is still needed.  Now you also need a service
> registration system (make sure this and other services like it are
> only running on one instance).
> So these are some of the ideas that I'm coming up with on this issue.
> I'm looking for more input.  What would you do?

View raw message