cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] Updated: (CASSANDRA-180) Blocking insert may have fewer responses than replication factor
Date Fri, 15 May 2009 15:18:45 GMT


Jonathan Ellis updated CASSANDRA-180:

    Attachment: 180-v2.patch

Reducing the responseCount is going to break things, since that's used for determining when
a successful quorum has been reached.

Say you have a 5 node cluster and a replication factor of 3.  But there is a network split
and the node a client is talking to can only see itself.  With your patch it would start up
a QRH with a RC of 1, get the ack, and report that the write was successful.  But we've just
sliently violated our promise of quorum consistency (at least 2 nodes).

The existing code is optimal for when a write succeeeds -- as soon as a quorum is reached
it returns, w/o waiting for any more responses that may or may not come.  The only problem
is that it will wait for timeout when it is impossible for a write to reach quorum b/c there
are not enough nodes.  I've attached a patch that addresses that problem.  What do you think?

(Note that we don't need to try to solve the problem of "what if at the beginning of a write
there are enough nodes to reach quorum, but partway through we get a nack from a node making
it impossible" b/c nodes only ack success, they don't nack failure.  And making them do so
adds more complication than it is worth for such an uncommon case.)

> Blocking insert may have fewer responses than replication factor
> ----------------------------------------------------------------
>                 Key: CASSANDRA-180
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.3
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>            Priority: Minor
>         Attachments: 180-v2.patch, issue180.patchv1
> Currently, block_insert always assumes the number of responses equals the replication
factor. However, for a small cluster (e,g, 1 node) and/or when failure occurs, the number
of responses could be fewer than the replication factor.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message