couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mutton, James" <>
Subject Re: Fabric worker timeouts and availability of replicas
Date Tue, 13 Oct 2015 00:12:36 GMT
This is something we’ve done with ours, making R and W failure reporting semi-consistent.
 I vaguely recall being part of that conversation and that we concluded on the list at the
time that a status code was not feasible.  Pretty sure there was a branch created in fabric
for it to decorate the document if requested.


On Oct 12, 2015, at 16:10, Robert Samuel Newson <> wrote:

> The 203 (Non-Authoritative Information) status code indicates that
>   the request was successful but the enclosed payload has been modified
>   from that of the origin server's 200 (OK) response by a transforming
>   proxy (
> Section 5.7.2 of [RFC7230]).
> So I don’t think we can send a 203.
> We could maybe use a response header if we list all the quorum counts, but then we’ll
hit header length issues for large bulk_docs posts, though it might be sufficient to indicate
that at least one of the responses did not meet quorum?
> We should also stop using the word quorum, it implies properties we don’t have. Quorum
should be reserved for systems exhibiting strong consistency properties.
> B.
>> On 12 Oct 2015, at 17:15, Paul Davis <> wrote:
>> I've had discussions about this in the past and there are a few
>> sticking points on it that aren't immediately obvious.
>> First, while the header approach is the most obvious, it misses API's
>> like POST to _all_docs where we return multiple documents. Each
>> document returned could have a different read quorum which a header
>> most likely wouldn't be able to accurately reflect. The obvious next
>> approach is to add an underscore prefixed field to each document read
>> (which is actually a fairly simple patch) but that ends up breaking
>> replication with all old CouchDB nodes in odd ways (it'd only fail
>> documents that had an incomplete quorum read which is transient). It
>> suddenly occurs to me that maybe we could condition the inclusion of
>> the field on the CouchDB user agent though if we can coordinate with
>> PouchDB and anyone other replicator implementations.
>> Secondly, the different status codes aren't entirely correct. 201/202
>> are obviously wrong as they're about entity creation, not read. 203
>> Non-Authoritative is wrong as the definition says that it reflects
>> entity header information. 204 No-Content is obviously wrong. 205
>> Reset Content is wrong and stipulates that no body should be present.
>> 206 Partial Content is also wrong as that's for range requests. And
>> that's all of the 200 response codes...
>> My favorite is probably 203 for this as its only a slight bending of
>> the definition though it does get us into the same mixed response
>> situation with _all_docs keys and so on.
>> On Mon, Oct 12, 2015 at 6:06 AM, Michael Rhodes
>> <> wrote:
>>> Agreed we should respond with the doc if we got at least one copy.
>>> I'd also be in favour of a reponse header which indicates whether we met the
>>> requested read quorum. This would mirror the approach to writes, where there
>>> is currently the separate 201/202 response code based on quorum success.
>>> This allows for a bit more flexibility client-side w.r.t. availability
>>> considerations.
>>> I'm not sure the best info to supply in the proposed header, whether it
>>> could be a simple true/false or more information on the number of nodes that
>>> responded and the quorum would be useful?
>>> Mike.
>>> On 07/10/2015 21:35, Robert Newson wrote:
>>>> Yes, I think it should. We should return the best answer we can.
>>>>> On 7 Oct 2015, at 13:48, Robert Kowalski <> wrote:
>>>>> Hi,
>>>>> I am currently taking a look at fabric and rexi.
>>>>> Given I open a doc, a CouchDB cluster returns the document.
>>>>> It also returns a doc, given not all replicas (r) are available and the
>>>>> *cluster is aware of it*: if the co-ordinator knows that there are fewer
>>>>> than r replicas available, it returns the document with a 200.
>>>>> When a worker is not available *right now*, and the call to one of them
>>>>> just times out (so the cluster is not aware that one node is
>>>>> unavailable),
>>>>> the Cluster will return a general timeout error instead of a result [1],
>>>>> even if just one of the worker fails.
>>>>> Should the cluster return a result instead in those cases?
>>>>> [1]

View raw message