couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Randall Leeds <randall.le...@gmail.com>
Subject Re: [jira] [Commented] (COUCHDB-1367) When settings revs_limit on db - the db increases its update_seq counter when viewing stats - but not when getting changes
Date Tue, 27 Dec 2011 05:04:27 GMT
On Mon, Dec 26, 2011 at 08:49, Jason Smith <jhs@iriscouch.com> wrote:
> Hi, Bob. Thanks for your feedback.
>
> On Mon, Dec 26, 2011 at 12:24 PM, Robert Dionne
> <dionne@dionne-associates.com> wrote:
>> Jason,
>>
>>  After looking into this a bit I do not think it's a bug, at most poor documentation.
update_seq != last_seq
>
> Nobody knows what update_seq means. Even a CouchDB committer got it wrong.
>
> Fine. It is "poor documentation."
>
> Adding last_seq into db_info is not helpful because last_seq also does
> not mean what we think it means. My last email demonstrates that
> last_seq is in fact incoherent.

<snip>

On Mon, Dec 26, 2011 at 23:03, Benoit Chesneau <bchesneau@gmail.com> wrote:
> Mmm right that confusing (maybe except if you consider update_seq as a
> way to know the numbers of updates in the databases but in this case
> the wording is confiusing) . Imo changes seq & commited_seq should be
> quites the same. At least a changes seq should only happen when there
> is a doc update ie each time and only if a revision is created.  Does
> that make sense?
>
> - benoiît

Yes it does. There is mostly consistent relationship between update
sequence (seq, update_seq, last_seq, committed_seq) and the by_seq
index. It seems entirely too confusing that there are things which
affect update_seq but do not appear in the by_seq btree. That is just
plain wrong, else a massive confusion of vocabulary. Benoit, I believe
you are right to suggest that none of these sequences-related things
should change unless a revision is created.

Bear with me for I believe ther is a related discussion about
replicability for _security, _local docs, etc. It's clear that there
are clustering and operational motivations for making this information
replicable, thus making them proper documents with a place in the
by_seq index, in the _changes feed, and affecting update_seq. Either
these things have a proper place in the sequential history of a
database or they do not. That there are things which affect update_seq
but do not appear in the by_seq index and _changes feed feels like a
mistake. Placing additional metadata in the db header feels like
rubbing salt in this wound.

Right now only replicable documents surface in the _changes feed and
are added to the by_seq btree but some other things affect the
update_seq. I've just gone and checked, as described in my previous
email, that none of these appear to require a change to update_seq for
any technical reason, though Jason properly points out that it is
perhaps useful for operational tasks such as knowing when to back up a
.couch file.

I see two reasonable ways forward.

1) Stop incrementing update_seq for anything but replicable document changes
2) Make things which already affect update_seq but do not appear in
_changes appear there, likely by turning them into proper MVCC
documents.

Regarding option 1:
This is easy. I already outlined how to do this. It requires removing
about 3 characters from our codebase. However, it spits at Jason's
operations concerns, which I think are quite valid, and misses an
opportunity for great improvement.

Regarding option 2:
There is a cluster-aware use case, an operations use case, and, I
think, a purity argument here. As for how to accomplish this feat
without terrible API breakage, we get a lot of help from our URL
structure. We have reserved paths which cannot conflict with documents
so it does not create ambiguity if '{"seq":20,"id":"_security", ...}'
appears in a changes feed. However, I think _security is a bad name
for this document because it requires that /_security API
compatibility is broken.

One solution I like right now is to add a _meta (without loss of
generality -- insert your own preferred name) document, with the
normal MVCC document API, referenced by the by_seq index and appearing
in the _changes feed, which contains both _revs_limit and _security
while preserving the legacy, cloberring, MVCC-oblivious APIs. Voila!
No breaking changes. Keep a pointer latest revision of this document
in the database header for fast access (and perhaps cache it in
memory).

It would probably be acceptable to keep these out of a vanilla changes
request (after all, they require db admin credentials to modify, and
in the case of _security to view). Opening the door to additional
flags for _changes also allows us to provide a natural extension of
this idea to replicable _local docs for the clustering use case.

Thoughts, concerns, emotions and relevant, famous quotations encouraged.

-Randall

Mime
View raw message