From user-return-19965-apmail-couchdb-user-archive=couchdb.apache.org@couchdb.apache.org Mon Mar 5 21:32:12 2012 Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4A5F09EEC for ; Mon, 5 Mar 2012 21:32:12 +0000 (UTC) Received: (qmail 65249 invoked by uid 500); 5 Mar 2012 21:32:10 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 65205 invoked by uid 500); 5 Mar 2012 21:32:10 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 65197 invoked by uid 99); 5 Mar 2012 21:32:10 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Mar 2012 21:32:10 +0000 Received: from localhost (HELO mail-iy0-f180.google.com) (127.0.0.1) (smtp-auth username rnewson, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Mar 2012 21:32:10 +0000 Received: by iage36 with SMTP id e36so8205922iag.11 for ; Mon, 05 Mar 2012 13:32:09 -0800 (PST) Received-SPF: pass (google.com: domain of rnewson@apache.org designates 10.42.29.70 as permitted sender) client-ip=10.42.29.70; Authentication-Results: mr.google.com; spf=pass (google.com: domain of rnewson@apache.org designates 10.42.29.70 as permitted sender) smtp.mail=rnewson@apache.org Received: from mr.google.com ([10.42.29.70]) by 10.42.29.70 with SMTP id q6mr14677514icc.22.1330983129915 (num_hops = 1); Mon, 05 Mar 2012 13:32:09 -0800 (PST) MIME-Version: 1.0 Received: by 10.42.29.70 with SMTP id q6mr12121694icc.22.1330983129788; Mon, 05 Mar 2012 13:32:09 -0800 (PST) Received: by 10.42.196.195 with HTTP; Mon, 5 Mar 2012 13:32:09 -0800 (PST) In-Reply-To: <05CA096A-497C-4234-AD55-6FAC2000C1AC@couchbase.com> References: <05CA096A-497C-4234-AD55-6FAC2000C1AC@couchbase.com> Date: Mon, 5 Mar 2012 21:32:09 +0000 Message-ID: Subject: Re: Strategy for reliable _changes feed workers From: Robert Newson To: user@couchdb.apache.org Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable I'd urge caution here. The _changes feed allows the replicator to avoid reprocessing updates that the target has already seen but, crucially, replication is not broken if the feed includes old updates. In BigCouch, and hence a future version of CouchDB, the changes feed can sometimes contain rows from before the since=3D value, in the case of failover to a different replica of a shard. Clearly, in BigCouch, you could not depend on the changes feed to ensure you process an item exactly once, so I suggest its a bad practice to assume the same of CouchDB. Instead, I would create a view that includes unprocessed items. Once processed (whatever that entails), update the document to indicate it has been processed. This will work everywhere. B. On 5 March 2012 21:13, Jens Alfke wrote: > > On Mar 5, 2012, at 8:23 AM, Zachary Zolton wrote: > > How are you using the _changes feed for reliable background processing? > > Well, the _changes feed is a key part of the CouchDB replicator, which us= es it exactly as you=92ve described. > > =A0* What is the last sequence number processed? > =A0* Have we already attempted to process this update? > =A0* How many times have we failed this update failed? > > The replicator stores a =93checkpoint=94 value which is the latest sequen= ce ID that it=92s completely processed. The logic of its full operation is = pretty complex (though of course the source code is available.) > > =97Jens