couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <rnew...@apache.org>
Subject Re: Could CouchDB 2.0 fix actual read quorum?
Date Wed, 25 Mar 2015 12:03:56 GMT
Also noting that there's no status code in the standard to indicate what we mean by 202 for
a write for GET. 

Sent from my iPhone

> On 25 Mar 2015, at 04:49, Robert Newson <rnewson@apache.org> wrote:
> 
> 2.0 is explicitly an AP system, the behaviour you describe is not classified as a bug.

> 
> Anti-entropy is the main reason that you cannot get strong consistency from the system,
it will transform "failed" writes (those that succeeded on one node but fewer than R nodes)
into success (N copies) as long as the nodes have enough healthy uptime. 
> 
> True of cloudant and 2.0. 
> 
> Sent from my iPhone
> 
>> On 24 Mar 2015, at 15:14, Mutton, James <jmutton@akamai.com> wrote:
>> 
>> Funny you should mention it.  I drafted an email in early February to queue up the
same discussion whenever I could get involved again (which I promptly forgot about).  What
happens currently in 2.0 appears unchanged from earlier versions.  When R is not satisfied
in fabric, fabric_doc_open:handle_message eventually responds with a {stop, …}  but leaves
the acc-state as the original r_not_met which triggers a read_repair from the response handler.
 read_repair results in an {ok, …} with the only doc available, because no other docs are
in the list.  The final doc returned to chttpd_db:couch_doc_open and thusly to chttpd_db:db_doc_req
is simply {ok, Doc}, which has now lost the fact that the answer was not complete.
>> 
>> This seems straightforward to fix by a change in fabric_open_doc:handle_response
and read_repair.  handle_response knows whether it has R met and could pass that forward,
or allow read-repair to pass it forward if read_repair is able to satisfy acc.r.  I can’t
speak for community interest in the behavior of sending a 202, but it’s something I’d
definitely like for the same reasons you cite.  Plus it just seems disconnected to do it on
writes but not reads.
>> 
>> Cheers,
>> </JamesM>
>> 
>>> On Mar 24, 2015, at 14:06, Nathan Vander Wilt <nate-lists@calftrail.com>
wrote:
>>> 
>>> Sorry, I have not been following CouchDB 2.0 roadmap but I was extending my fermata-couchdb
plugin today and realized that perhaps the Apache release of BigCouch as CouchDB 2.0 might
provide an opportunity to fix a serious issue I had using Cloudant's implementation.
>>> 
>>> See https://github.com/cloudant/bigcouch/issues/55#issuecomment-30186518 for
some additional background/explanation, but my understanding is that Cloudant for all practical
purposes ignores the read durability parameter. So you can write with ?w=N to attempt some
level of quorum, and get a 202 back if that quorum is unment. _However_ when you ?r=N it really
doesn't matter if only <N nodes are available…if even just a single available node has
some version of the requested document you will get a successful response (!).
>>> 
>>> So in practice, there's no way to actually use the quasi-Dynamo features to dynamically
_choose_ between consistency or availability — when it comes time to read back a consistent
result, BigCouch instead just always gives you availability* regardless of what a given request
actually needs. (In my usage I ended up treating a 202 write as a 500, rather than proceeding
with no way of ever knowing whether a write did NOT ACTUALLY conflict or just hadn't YET because
$who_knows_how_many nodes were still down…)
>>> 
>>> IIRC, this was both confirmed and acknowledged as a serious bug by a Cloudant
engineer (or support personnel at least) but could not be quickly fixed as it could introduce
backwards-compatibility concerns. So…
>>> 
>>> Is CouchDB 2.0 already breaking backwards compatibility with BigCouch? If true,
could this read durability issue now be fixed during the merge?
>>> 
>>> thanks,
>>> -natevw
>>> 
>>> 
>>> 
>>> 
>>> 
>>> * DISCLAIMER: this statement has not been endorsed by actual uptime of *any*
Couch fork…
>> 

Mime
View raw message