couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Smith <...@iriscouch.com>
Subject Re: [jira] [Commented] (COUCHDB-1367) When settings revs_limit on db - the db increases its update_seq counter when viewing stats - but not when getting changes
Date Tue, 27 Dec 2011 10:22:17 GMT
On Tue, Dec 27, 2011 at 5:04 AM, Randall Leeds <randall.leeds@gmail.com> wrote:
> Yes it does. There is mostly consistent relationship between update
> sequence (seq, update_seq, last_seq, committed_seq) and the by_seq
> index. It seems entirely too confusing that there are things which
> affect update_seq but do not appear in the by_seq btree. That is just
> plain wrong, else a massive confusion of vocabulary.

I think it is confused definitions.

If by_seq is defined as "the sequence ids of regular documents" then
it is implemented correctly.

> Bear with me for I believe ther is a related discussion about
> replicability for _security, _local docs, etc. It's clear that there
> are clustering and operational motivations for making this information
> replicable, thus making them proper documents with a place in the
> by_seq index, in the _changes feed, and affecting update_seq.

It is common to track changes to data vs. changes to metadata
independently. Changing Unix file permissions updates ctime but not
mtime. Changing Unix files updates both ctime and mtime. In CouchDB,
update_seq plays the role of ctime (data or metadata updates), and
nobody's been cast for mtime (metadata-only updates).

> Either
> these things have a proper place in the sequential history of a
> database or they do not. That there are things which affect update_seq
> but do not appear in the by_seq index and _changes feed feels like a
> mistake.

The first sentence is, well, a tautology actually, but it asks the
right question and the answer is they DO NOT belong. _changes shows
data, not metadata. By definition, _changes is anything worth
replicating.

But I hope my filesystem example above shows why it is okay to
increment update_seq but not change by_seq.

The bug with update_seq is not that it it is too eager (increments for
_security, _revs_limit), but it is not eager enough (it should bump
for _local too).

2. As a frequent consumer of _changes, I would prefer *not* to see
_local documents, nor _security or other updates in there. They are
metadata, not data. Maybe I misunderstood, but nobody wants to
*replicate* _security objects or _local docs; they just want MVCC
semantics (Adam on _security, IIRC) and a simplified API (me, on
making all metadata a _local doc, and making _local docs full MVCC).


> Placing additional metadata in the db header feels like
> rubbing salt in this wound.

On the contrary, IMHO, we want

1. A new value: the sequence id of the most recent document update (pretty sure)
2. Available to the client alongside existing values like doc_count
doc_del_count (somewhat sure)

> Right now only replicable documents surface in the _changes feed and
> are added to the by_seq btree but some other things affect the
> update_seq. I've just gone and checked, as described in my previous
> email, that none of these appear to require a change to update_seq for
> any technical reason, though Jason properly points out that it is
> perhaps useful for operational tasks such as knowing when to back up a
> .couch file.

Here is where get into migrating to more _local docs. I am actually
not sure if that's good for this discussion. But anyway, my basic
feeling is

* All metadata that clients can change is _local docs, with MVCC, *not
in* the by_seq tree
* update_seq counts changes to data or metadata
* update_sikh (WLOG) counts changes to documents only (changes to the
by_seq tree)

Doable? It bears mentioning that I haven't any idea what I am talking about.

> Thoughts, concerns, emotions and relevant, famous quotations encouraged.

WELCOME TO THE PARTY, PAL!

-- 
Iris Couch

Mime
View raw message