couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Smith <...@iriscouch.com>
Subject Re: [jira] [Commented] (COUCHDB-1367) When settings revs_limit on db - the db increases its update_seq counter when viewing stats - but not when getting changes
Date Mon, 26 Dec 2011 09:10:52 GMT
Hi, Randall. Thanks for inviting me to argue a bit more. I hope you'll
be persuaded that, if -1367 is not a bug, at least there is *some*
bug.

tl;dr summary:

This is a real bug--a paper cut with a workaround, but still a real bug.

1. Apps want a changes feed since 0, but they want to know when
they've "caught up" (defined below)
2. These apps (and robust apps generally) probably start out by
pinging the /db anyway. Bob N. and I independently did so.
3. update_seq looks deceptively like the sequence id of the latest
change, and people assume so. They define "caught up" as receiving a
change at or above this value. They expect to "catch up" in finite
time, and even if the db receives no subsequent updates.
4. In fact, CouchDB does not disclose the sequence id of the latest
change in the /db response. To know that value:
  4a. If you want to process every change anyway, just get _changes
and use last_seq
  4b. If you just want the last sequence id, query
_changes?descending=true&limit=1
    4b(1). If the response has a change, use its last_seq value
    4b(2). If the response has no changes, ignore the last_seq value
(it is really the update_seq) and use 0

Step 3 is the major paper cut. That step 4 exists and is complicated
is the minor paper cut.

On Mon, Dec 26, 2011 at 5:36 AM, Randall Leeds (Commented) (JIRA)
<jira@apache.org> wrote:
>
>    [ https://issues.apache.org/jira/browse/COUCHDB-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175892#comment-13175892
]
>
> Randall Leeds commented on COUCHDB-1367:
> ----------------------------------------
>
>> Wait a second. Robert, you are not fixing a bug in C-L, you are working around a
deficiency in CouchDB.
>
> Can't both be true?

Only in the trivial sense. This ticket reveals that app
developers--Henrik and me, but also a committer--misunderstand
update_seq, thinking it is last_seq. last_seq is not easy to learn.

> Nope. You can not ever know. You always know the latest sequence number at some arbitrarily
recent point in time.

Sorry, I cut corners and was not clear. Of course, nobody ever really
knows anything except events in the very recent past. But I mean in
the context of a _changes query one-two punch: get the last_seq, then
begin a continuous feed since that value.

The bug is that users cannot readily know the id of the most recent
change. In fact, "the id of the most recent change" has no explicit
label or name in the CouchDB interface. Neither update_seq nor
last_seq mean exactly that.

>> What if I want to see the most recent five changes? What if there are a hundred million
documents? What if 99% of the time, update_seq equals last_seq and so developers assume it
means something it doesn't?
>
> In order:
>  * /_changes?descending=true&limit=5

I stand corrected. I had forgotten about a descending changes query.
That resolves the hundred-million-docs problem. (My erroneous point
was, 100M docs makes it too expensive to learn last_seq.)

But that response looks bizarre.

GET /db/_changes?descending=true\&limit=5
{"results":[
{"seq":22,"id":"after_3","changes":[{"rev":"1-0785e9eb543380151003dc452c3a001a"}]},
{"seq":21,"id":"after_2","changes":[{"rev":"1-0785e9eb543380151003dc452c3a001a"}]},
{"seq":20,"id":"after_1","changes":[{"rev":"1-0785e9eb543380151003dc452c3a001a"}]},
{"seq":19,"id":"conc","changes":[{"rev":"2-584a4a504a97009241d2587fee8b5eb8"}]},
{"seq":17,"id":"preload_create","changes":[{"rev":"1-28bf6cd8af83c40c6e3fb82b608ce98f"}]}
],
"last_seq":17}

last_seq is the *least recent* change. If you query with &limit=1 then
they will be equal, and that is nice. *Except* if there were no
changes yet.

    $ curl -X PUT localhost:5984/x
    {"ok":true}

    $ curl -X PUT localhost:5984/x/_revs_limit -d $RANDOM
    {"ok":true}
    $ curl -X PUT localhost:5984/x/_revs_limit -d $RANDOM
    {"ok":true}
    $ curl -X PUT localhost:5984/x/_revs_limit -d $RANDOM
    {"ok":true}

    $ curl localhost:5984/x/_changes
    {"results":[

    ],
    "last_seq":0}

    $ curl localhost:5984/x/_changes?descending=true
    {"results":[

    ],
    "last_seq":3}

Weird.

>  * Add additional information to the changes feed, perhaps with a query parameter (almost
the reverse of include docs)
>  * Stop incrementing the update sequence on certain kinds of non-document changes
>  * Add more information to the db information response

A commonly-needed and valuable piece of data like this seems most
appropriate cached in the db header and served in the db information.

-- 
Iris Couch

Mime
View raw message