From user-return-13700-apmail-couchdb-user-archive=couchdb.apache.org@couchdb.apache.org Mon Nov 15 06:49:44 2010 Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 40500 invoked from network); 15 Nov 2010 06:49:43 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 15 Nov 2010 06:49:43 -0000 Received: (qmail 87217 invoked by uid 500); 15 Nov 2010 06:50:13 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 86948 invoked by uid 500); 15 Nov 2010 06:50:13 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 86940 invoked by uid 99); 15 Nov 2010 06:50:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Nov 2010 06:50:12 +0000 X-ASF-Spam-Status: No, hits=3.6 required=10.0 tests=FREEMAIL_FROM,FS_REPLICA,MIME_QP_LONG_LINE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of wout.mertens@gmail.com designates 209.85.215.180 as permitted sender) Received: from [209.85.215.180] (HELO mail-ey0-f180.google.com) (209.85.215.180) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Nov 2010 06:50:04 +0000 Received: by eyf18 with SMTP id 18so2133122eyf.11 for ; Sun, 14 Nov 2010 22:49:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:subject:references:from :content-type:x-mailer:in-reply-to:message-id:date:to :content-transfer-encoding:mime-version; bh=AX4o+58elcqmriFem9AK1T1gh/IFXeuf/eT572vbn7c=; b=Jd4aBAbcngc8anmkFgtYrOyuGO9BZCBCZW/05tH2JkTKUazyKfOad2fi8TpfTpmQjx OIKd16+5yS5INebiD++oQZuDWJxtEbpJqYHavGaZpaA5gLlvq/EIARuaG4YHKVF1j507 B3tn0Jt9yHMetHPw9hvEXIjExe4GCNp71ZD7w= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=subject:references:from:content-type:x-mailer:in-reply-to :message-id:date:to:content-transfer-encoding:mime-version; b=MW7tpMg0shcf0lJFPNh7fZtgav2RIdDMhuVBwbAKd9yJCK1FhXG50tMsKkIrbpQByU LtWohd9JffjBNNLu/kIPz6A2RWIeu1lmSds0gwoYPJF4cguECxsycnWI547OQk6rtTF0 KgTQAncNxSp7ISBDtwZD2+MU0GOmKvLRDfOmk= Received: by 10.213.34.66 with SMTP id k2mr1996236ebd.61.1289803782887; Sun, 14 Nov 2010 22:49:42 -0800 (PST) Received: from [192.168.1.102] ([83.101.90.105]) by mx.google.com with ESMTPS id b52sm6312150eei.13.2010.11.14.22.49.40 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sun, 14 Nov 2010 22:49:41 -0800 (PST) Subject: Re: How to keep from sending more than one email from multiple replicated couchdb instances References: From: Wout Mertens Content-Type: text/plain; charset=us-ascii X-Mailer: iPad Mail (8C134b) In-Reply-To: Message-Id: <2FC4E720-EEE5-48FC-8B9E-CA0782F4A991@gmail.com> Date: Mon, 15 Nov 2010 07:50:17 +0100 To: "user@couchdb.apache.org" Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (iPad Mail 8C134b) X-Virus-Checked: Checked by ClamAV on apache.org I think you need to decouple the database from the replication. Replication m= anagement is not a first-class citizen in CouchDB (yet?) and the problems yo= u present show that. Basically what you're looking at is a message board service, where clients p= ost requests ("send this email") and servers take requests and execute them.= If you add a board monitor to the mix, that one can be responsible for putt= ing taken requests back on the board if the server that took it isn't respon= ding. The CouchDB servers would host this message board database and a replication= monitor makes sure that all servers are up to date. The monitors can be made resilient by having multiple, that communicate with= heartbeats. There is only one monitor master that does the rescheduling, wa= rning etc and the others stand by until it stops responding. How does this model sound? Note that the requests put on the board should be "transactional", in that t= he have to be retry-able if their server fails. If need be, the request can p= robably be split up in smaller parts but then you need an extra monitor that= follows a recipe and posts these parts in execution order. Wout. On 15 Nov 2010, at 02:01, Mike Fedyk wrote: > node.js + CouchDB =3D=3D Crazy Delicious by Mikeal Rogers > http://jsconf.eu/2010/speaker/nodejs_couchdb_crazy_delicious.html >=20 > I was watching this a couple days ago and I've been thinking about how > to deal with instance and service (think of sending emails as a > "service") failures. Because it's easy to make sure that only one > email is sent if you only have one server sending emails, but if that > machine fails, then no emails get sent out. >=20 > You compose an email while offline and save it to your local couch > instance. Then later it gets replicated to one of the couchdb > instances in your cloud. And then: >=20 > 1. You have the date when it was saved on the phone, etc. If you had > a timestamp when that replication happened, you'd be able to have a > chain of couchdb instances try to send the email, but only if it is > older than X time after it was replicated to your cloud of couchdb > instances. instance_a would try immediately, instance_b tries if it > hasn't been taken in X minutes, and so on for instance_c. see [A]. >=20 > 2. When instance_a wants to send the email, it updates the state to > "taking" and then waits for instance_b and instance_c to ack the > taking by adding fields to the current document. oops, instance_b and > instance_c will race more often than not and you'll get a conflict so > it needs to be separate temporary state tracking documents. You still > need [A] or if there are no other instances you'll wait forever for > acks that won't happen. >=20 > 3. You have one instance that sends emails and you deal with the > downtime if that instance fails or some other failure happens that > prevents email from being sent. >=20 > 4. You send periodic test emails to make sure they are being sent, and > if they are not then take over the function on instance_$self. see > [B] >=20 > A) And this only works assuming that all of your cloud couchdb > instances are replicating to each other correctly at the moment. Now > you have N > 1 emails sent out. (and imagine if what's happening is > something where it's more important than receiving an email or > receiving more than one email) To keep this from happening you need a > couchdb instance heartbeat (maybe have an app update a document that > describes that instances "registration" in the system with the current > time stamp every 60 seconds) and a STONITH system to kill any > instances of couchdb that stop updating their document. >=20 > B) Do you still need [A]? maybe it's good enough that the email > didn't get back to you, but maybe it is sending emails to other > places. so it seems [A] is still needed. Now you also need a service > registration system (make sure this and other services like it are > only running on one instance). >=20 > So these are some of the ideas that I'm coming up with on this issue. > I'm looking for more input. What would you do?