cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DuyHai Doan <doanduy...@gmail.com>
Subject Re: Some love for multi-partition LWT?
Date Tue, 08 Sep 2015 09:54:51 GMT
"I could imagine that multiple paxos rounds could be played for different
partitions and these rounds would be dependent on each other"

Example of cluster of 10 nodes (N1 ... N10) and RF=3.

Suppose a LWT with 2 partitions and 2 mutations M1 & M2, coordinator C1. It
will imply 2 Paxos rounds with the same ballot B1, one round affecting N1,
N2, N3 for mutation M1 and the other round affecting N4, N5 and N6 for
mutation M2.

Suppose that the prepare/promise, read/results phases are successful for
both Paxos rounds (see here for different LWT phases
http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0)

The coordinator C1 then sends an propose() message to N1 ... N6 with
respective mutations M1 and M2. N1, N2 and N3 will reply accept() but
imagine another coordinator C2 has just proposed a higher ballot B2 (B2 >
B1) for nodes N4, N5 & N6 only. Those node will reply NoAck() (or won't
reply) to C1.

Then the whole multi-partition LWT will need to be aborted because we
cannot proceed with mutation M2. But N1, N2 and N3 already accepted the
mutation M1 and it could be committed by any subsequent Paxos round on N1,
N2 and N3

We certainly don't want that.

So how to abort safely mutation M1 on N1, N2 and N3 in this case ? Of
course the coordinator C1 could send itself another prepare() message with
ballot B3 higher than B1 to abort the accepted value in N1, N2 and N3, but
we do not have ANY GUARANTEE that in the meantime, there is another Paxos
round impacting N1, N2 and N3 with ballot higher than B1 that will commit
the undesirable mutation M1...

This simple example shows how hard it is to implement multi-partition Paxos
rounds. The fact that you have multiple Paxos rounds that are dependent on
each other break the safety guarantees of the original Paxos paper.

 The only way I can see that will ensure safety in this case is to forbid
any Paxos round on N1 ... N6 as long as the current rounds are not finished
yet, and this is exactly what a distributed lock does.









On Tue, Sep 8, 2015 at 8:15 AM, Marek Lewandowski <
marekmlewandowski@gmail.com> wrote:

> Are you absolutely sure that lock is required? I could imagine that
> multiple paxos rounds could be played for different partitions and these
> rounds would be dependent on each other.
>
> Performance aside, can you please elaborate where do you see such need for
> lock?
> On 8 Sep 2015 00:05, "DuyHai Doan" <doanduyhai@gmail.com> wrote:
>
>> Multi partitions LWT is not supported currently on purpose. To support
>> it, we would have to emulate a distributed lock which is pretty bad for
>> performance.
>>
>> On Mon, Sep 7, 2015 at 10:38 PM, Marek Lewandowski <
>> marekmlewandowski@gmail.com> wrote:
>>
>>> Hello there,
>>>
>>> would you be interested in having multi-partition support for
>>> lightweight transactions in order words to have ability to express
>>> something like:
>>>
>>> INSERT INTO … IF NOT EXISTS AND
>>> UPDATE … IF EXISTS AND
>>> UPDATE … IF colX = ‘xyz’
>>>
>>> where each statement refers to a row living potentially on different set
>>> of nodes.
>>> In yet another words: imagine batch with conditions, but which allows to
>>> specify multiple statements with conditions for rows in different
>>> partitions.
>>>
>>> Do you find it very useful, moderately useful or you don’t need that
>>> because you have some superb business logic handling of such cases for
>>> example?
>>>
>>> Let me know.
>>> Regards,
>>> Marek
>>
>>
>>

Mime
View raw message