couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Randall Leeds <randall.le...@gmail.com>
Subject Re: Is it possible to bring back optional old all-or-nothing behaviour?
Date Fri, 23 Dec 2011 04:46:19 GMT
On Thu, Dec 22, 2011 at 20:18, Alexander Uvarov
<alexander.uvarov@gmail.com> wrote:
>
> On Dec 23, 2011, at 1:49 AM, Paul Davis wrote:
>
>> On Thu, Dec 22, 2011 at 11:31 AM, Robert Newson <rnewson@apache.org> wrote:
>>> In my opinion, and I believe the majority opinion of the group, the
>>> CouchDB API should be the same everywhere. This specifically includes
>>> not doing things on a single box that will not work in a
>>> clustered/sharded situation. It's why our transactions are scoped to a
>>> single document, for example.
>>>
>>> I will also note that all_or_nothing does not provide multi-document
>>> ACID transactions. The batches used in bulk_docs are not recorded, so
>>> those items will be replicated individually (and in parallel, so not
>>> even in a predictable order), which would break the C and I
>>> characteristics on the receiving server. The old semantic would abort
>>> the whole update if any one of the documents couldn't be updated but
>>> the new semantic simply introduces a conflict in that case.
>>>
>>
>> Slight nit pick, but new behavior just returns the error that the
>> update would *cause* the conflict. (Assuming default non-replicator
>> _bulk_docs calls.)
>>
>
> Am I missing something? Current bulk_docs implementation will introduce a conflict in
case of conflict, not just reject and return the error.
>
>>> B.
>>>
>>> On 22 December 2011 16:48, Alexander Uvarov <alexander.uvarov@gmail.com>
wrote:
>>>> And can become much easier with multi-document transactions as an option.
>>>>
>>>> On Thu, Dec 22, 2011 at 10:43 PM, Pepijn de Vos <pepijndevos@yahoo.com>
wrote:
>>>>> But not everyone needs a cluster. I like CouchDB because it's easy, not
because "it scales", and in some situations, all_or_nothing is easy.
>>>>>
>>
>> Robert mentions it in passing, but the biggest reason that we dropped
>> the original _bulk_docs behavior doesn't have anything to do with
>> clustering. It was because the semantics are violated as soon as you
>> try and replicate. Since there's no tracking of the group of docs
>> posted to _bulk_docs then as soon as your mobile client tried to move
>> data in or out you'd lose all three of ACI in ACID.
>
> Ain't every system with multi-master architecture will cause problems as soon as you
try to replicate? Should this force people to design for replication even them don't need
it? In my first message I mentioned that not every application need to be replicated. There
are a thousands of such apps in the world. Even it's possible to design some app for replication,
it can be very hard to do and developer and probably future users will spend a lot of time
for superfluous.

It's possible, but expensive, to have multi-master architecture and
transaction isolation, but it involves distributed commit protocols.

The wiki documentation is maybe slightly misleading in that the
guarantees provided by the current Apache CouchDB around
all_or_nothing have nothing to do with database crashes. All
_bulk_docs requests are written as a single group commit with a single
database header write, so either all valid, non-conflicting writes are
durably stored or none are. all_or_nothing lets validation functions
reject the whole bulk rather than just the failing write, and then
during the commit phase create conflicts rather than returning an
error.

Here's the key: if your documents are known to be valid (or you don't
have a validate_doc_update function in your database), then the
difference is only whether or not conflicts are created or rejected,
not whether all writes hit disk durably or not, as the wiki might seem
to suggest.

The replicator uses a flag on the query parameter to create conflicts
rather than rejecting them: ?new_edits=false. If you can tolerate
conflicts please feel free to create your own revision ids (bump the
leading number, create a random id, and slap them together with a
dash) and use ?new_edits=false. You'll get the same semantics with
respect to conflicts as all_or_nothing. You lose little by generating
your own revision ids since deterministic revisions is an optimization
for replication. Maybe that lets you move forward with your use case.

More to the point though... I find replication is one of CouchDB's
killer features and that's why some devs (like me and Paul) would
rather see all_or_nothing vanish completely. If you need relational
consistency but not replication you might be better served elsewhere.
I won't tell you to go away (I love our users, and so I'm offering a
lesser-known workaround with ?new_edits) but I won't mislead you about
the goals of the project either.

-Randall

Mime
View raw message