Return-Path: X-Original-To: apmail-couchdb-dev-archive@www.apache.org Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 590DA990C for ; Mon, 26 Dec 2011 09:11:51 +0000 (UTC) Received: (qmail 58361 invoked by uid 500); 26 Dec 2011 09:11:45 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 58139 invoked by uid 500); 26 Dec 2011 09:11:43 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 58131 invoked by uid 99); 26 Dec 2011 09:11:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Dec 2011 09:11:42 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.212.52] (HELO mail-vw0-f52.google.com) (209.85.212.52) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Dec 2011 09:11:36 +0000 Received: by vbbfk1 with SMTP id fk1so11229174vbb.11 for ; Mon, 26 Dec 2011 01:11:15 -0800 (PST) Received: by 10.52.240.144 with SMTP id wa16mr11784468vdc.33.1324890673277; Mon, 26 Dec 2011 01:11:13 -0800 (PST) MIME-Version: 1.0 Received: by 10.220.199.198 with HTTP; Mon, 26 Dec 2011 01:10:52 -0800 (PST) In-Reply-To: <1765975633.44958.1324877790680.JavaMail.tomcat@hel.zones.apache.org> References: <2073127356.24602.1324243590757.JavaMail.tomcat@hel.zones.apache.org> <1765975633.44958.1324877790680.JavaMail.tomcat@hel.zones.apache.org> From: Jason Smith Date: Mon, 26 Dec 2011 09:10:52 +0000 Message-ID: Subject: Re: [jira] [Commented] (COUCHDB-1367) When settings revs_limit on db - the db increases its update_seq counter when viewing stats - but not when getting changes To: dev@couchdb.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, Randall. Thanks for inviting me to argue a bit more. I hope you'll be persuaded that, if -1367 is not a bug, at least there is *some* bug. tl;dr summary: This is a real bug--a paper cut with a workaround, but still a real bug. 1. Apps want a changes feed since 0, but they want to know when they've "caught up" (defined below) 2. These apps (and robust apps generally) probably start out by pinging the /db anyway. Bob N. and I independently did so. 3. update_seq looks deceptively like the sequence id of the latest change, and people assume so. They define "caught up" as receiving a change at or above this value. They expect to "catch up" in finite time, and even if the db receives no subsequent updates. 4. In fact, CouchDB does not disclose the sequence id of the latest change in the /db response. To know that value: 4a. If you want to process every change anyway, just get _changes and use last_seq 4b. If you just want the last sequence id, query _changes?descending=3Dtrue&limit=3D1 4b(1). If the response has a change, use its last_seq value 4b(2). If the response has no changes, ignore the last_seq value (it is really the update_seq) and use 0 Step 3 is the major paper cut. That step 4 exists and is complicated is the minor paper cut. On Mon, Dec 26, 2011 at 5:36 AM, Randall Leeds (Commented) (JIRA) wrote: > > =C2=A0 =C2=A0[ https://issues.apache.org/jira/browse/COUCHDB-1367?page=3D= com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCom= mentId=3D13175892#comment-13175892 ] > > Randall Leeds commented on COUCHDB-1367: > ---------------------------------------- > >> Wait a second. Robert, you are not fixing a bug in C-L, you are working = around a deficiency in CouchDB. > > Can't both be true? Only in the trivial sense. This ticket reveals that app developers--Henrik and me, but also a committer--misunderstand update_seq, thinking it is last_seq. last_seq is not easy to learn. > Nope. You can not ever know. You always know the latest sequence number a= t some arbitrarily recent point in time. Sorry, I cut corners and was not clear. Of course, nobody ever really knows anything except events in the very recent past. But I mean in the context of a _changes query one-two punch: get the last_seq, then begin a continuous feed since that value. The bug is that users cannot readily know the id of the most recent change. In fact, "the id of the most recent change" has no explicit label or name in the CouchDB interface. Neither update_seq nor last_seq mean exactly that. >> What if I want to see the most recent five changes? What if there are a = hundred million documents? What if 99% of the time, update_seq equals last_= seq and so developers assume it means something it doesn't? > > In order: > =C2=A0* /_changes?descending=3Dtrue&limit=3D5 I stand corrected. I had forgotten about a descending changes query. That resolves the hundred-million-docs problem. (My erroneous point was, 100M docs makes it too expensive to learn last_seq.) But that response looks bizarre. GET /db/_changes?descending=3Dtrue\&limit=3D5 {"results":[ {"seq":22,"id":"after_3","changes":[{"rev":"1-0785e9eb543380151003dc452c3a0= 01a"}]}, {"seq":21,"id":"after_2","changes":[{"rev":"1-0785e9eb543380151003dc452c3a0= 01a"}]}, {"seq":20,"id":"after_1","changes":[{"rev":"1-0785e9eb543380151003dc452c3a0= 01a"}]}, {"seq":19,"id":"conc","changes":[{"rev":"2-584a4a504a97009241d2587fee8b5eb8= "}]}, {"seq":17,"id":"preload_create","changes":[{"rev":"1-28bf6cd8af83c40c6e3fb8= 2b608ce98f"}]} ], "last_seq":17} last_seq is the *least recent* change. If you query with &limit=3D1 then they will be equal, and that is nice. *Except* if there were no changes yet. $ curl -X PUT localhost:5984/x {"ok":true} $ curl -X PUT localhost:5984/x/_revs_limit -d $RANDOM {"ok":true} $ curl -X PUT localhost:5984/x/_revs_limit -d $RANDOM {"ok":true} $ curl -X PUT localhost:5984/x/_revs_limit -d $RANDOM {"ok":true} $ curl localhost:5984/x/_changes {"results":[ ], "last_seq":0} $ curl localhost:5984/x/_changes?descending=3Dtrue {"results":[ ], "last_seq":3} Weird. > =C2=A0* Add additional information to the changes feed, perhaps with a qu= ery parameter (almost the reverse of include docs) > =C2=A0* Stop incrementing the update sequence on certain kinds of non-doc= ument changes > =C2=A0* Add more information to the db information response A commonly-needed and valuable piece of data like this seems most appropriate cached in the db header and served in the db information. --=20 Iris Couch