cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DuyHai Doan <doanduy...@gmail.com>
Subject Re: Some love for multi-partition LWT?
Date Tue, 08 Sep 2015 12:01:45 GMT
"Do you think it could work?"

At first glance, maybe, but it would involve a huge number of round trips
and a lot of contentions. You'll risk serious deadlocks.

Second, to prove that a solution works, you'll need to prove that it works
for ALL situations, not just a few. Proving something  wrong is easy,
you'll need just only one counter-example. Proving that something is true
in all cases is much much harder. The fact that I don't find any
counter-example right now to your proposal does not mean it's correct and
it will work. I may forget some corner cases.



On Tue, Sep 8, 2015 at 1:53 PM, Peter Lin <woolfel@gmail.com> wrote:

>
> I would caution using paxos for distributed transaction in an
> inappropriate way. The model has to be logically and mathematically
> correct, otherwise you end up with corrupt data. In the worst case, it
> could cause cascading failure that brings down the cluster. I've seen
> distributed systems come to a grinding halt due to disagreement between
> nodes.
>
> As far as I know based on existing literature and first hand experience,
> there's only 2 ways: a central transaction manager or distributed lock.
>
> distributed locks are expensive and can result in cluster wide deadlock.
> Look at the data grid space if you want to see where and when distributed
> locks cause major headaches. Just ask anyone that has tried to use a
> distributed cache as a distributed in-memory database and they'll tell you
> how poorly that scales.
>
> I recommend spending a couple of months reading up on the topic, there's a
> wealth of literature on this. These are very old problems and lots of smart
> people have spent time figuring out what works and doesn't work.
>
> On Tue, Sep 8, 2015 at 7:30 AM, Marek Lewandowski <
> marekmlewandowski@gmail.com> wrote:
>
>> "This simple example shows how hard it is to implement multi-partition
>> Paxos rounds. The fact that you have multiple Paxos rounds that are
>> dependent on each other break the safety guarantees of the original Paxos
>> paper. "
>> What if this dependency is explicitly specified in proposal.
>> Assume that each proposal consists of all mutations in this case: {M1,M2}.
>> So N1,N2,N3 receive proposal {M1,M2} and nodes N4,N5,N6 receive proposal
>> {M1,M2}. (Yes, in this case I assume that nodes will get more data than
>> they need, but that's the cost).
>>
>> If C2 wins with higher ballot at nodes N4, N5, N6  after N1,N2,N3
>>  accepted {M1,M2} from C1, then C2 sees in progress proposal {M1, M2} which
>> it has to complete, so it will try nodes N1,N2,N3 and get agreement as well
>> as nodes N4,N5,N6 and eventually commit unless another coordinator jumps in
>> and trumps paxos rounds.
>>
>> Do you think it could work?
>>
>> "To guarantee you need a distributed lock or a different design like
>> datomic. Look at what rich hickey has done with datomic "
>> I'll look into that, thanks.
>>
>> 2015-09-08 12:52 GMT+02:00 <woolfel@gmail.com>:
>>
>>>
>>> There's quite a bit of literature on the topic. Look at what is in
>>> acmqueue and you'll see what others are saying is accurate.
>>>
>>> To guarantee you need a distributed lock or a different design like
>>> datomic. Look at what rich hickey has done with datomic
>>>
>>>
>>>
>>> Sent from my iPhone
>>>
>>> On Sep 8, 2015, at 5:54 AM, DuyHai Doan <doanduyhai@gmail.com> wrote:
>>>
>>> "I could imagine that multiple paxos rounds could be played for
>>> different partitions and these rounds would be dependent on each other"
>>>
>>> Example of cluster of 10 nodes (N1 ... N10) and RF=3.
>>>
>>> Suppose a LWT with 2 partitions and 2 mutations M1 & M2, coordinator C1.
>>> It will imply 2 Paxos rounds with the same ballot B1, one round affecting
>>> N1, N2, N3 for mutation M1 and the other round affecting N4, N5 and N6 for
>>> mutation M2.
>>>
>>> Suppose that the prepare/promise, read/results phases are successful for
>>> both Paxos rounds (see here for different LWT phases
>>> http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0
>>> )
>>>
>>> The coordinator C1 then sends an propose() message to N1 ... N6 with
>>> respective mutations M1 and M2. N1, N2 and N3 will reply accept() but
>>> imagine another coordinator C2 has just proposed a higher ballot B2 (B2 >
>>> B1) for nodes N4, N5 & N6 only. Those node will reply NoAck() (or won't
>>> reply) to C1.
>>>
>>> Then the whole multi-partition LWT will need to be aborted because we
>>> cannot proceed with mutation M2. But N1, N2 and N3 already accepted the
>>> mutation M1 and it could be committed by any subsequent Paxos round on N1,
>>> N2 and N3
>>>
>>> We certainly don't want that.
>>>
>>> So how to abort safely mutation M1 on N1, N2 and N3 in this case ? Of
>>> course the coordinator C1 could send itself another prepare() message with
>>> ballot B3 higher than B1 to abort the accepted value in N1, N2 and N3, but
>>> we do not have ANY GUARANTEE that in the meantime, there is another Paxos
>>> round impacting N1, N2 and N3 with ballot higher than B1 that will commit
>>> the undesirable mutation M1...
>>>
>>> This simple example shows how hard it is to implement multi-partition
>>> Paxos rounds. The fact that you have multiple Paxos rounds that are
>>> dependent on each other break the safety guarantees of the original Paxos
>>> paper.
>>>
>>>  The only way I can see that will ensure safety in this case is to
>>> forbid any Paxos round on N1 ... N6 as long as the current rounds are not
>>> finished yet, and this is exactly what a distributed lock does.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Sep 8, 2015 at 8:15 AM, Marek Lewandowski <
>>> marekmlewandowski@gmail.com> wrote:
>>>
>>>> Are you absolutely sure that lock is required? I could imagine that
>>>> multiple paxos rounds could be played for different partitions and these
>>>> rounds would be dependent on each other.
>>>>
>>>> Performance aside, can you please elaborate where do you see such need
>>>> for lock?
>>>> On 8 Sep 2015 00:05, "DuyHai Doan" <doanduyhai@gmail.com> wrote:
>>>>
>>>>> Multi partitions LWT is not supported currently on purpose. To support
>>>>> it, we would have to emulate a distributed lock which is pretty bad for
>>>>> performance.
>>>>>
>>>>> On Mon, Sep 7, 2015 at 10:38 PM, Marek Lewandowski <
>>>>> marekmlewandowski@gmail.com> wrote:
>>>>>
>>>>>> Hello there,
>>>>>>
>>>>>> would you be interested in having multi-partition support for
>>>>>> lightweight transactions in order words to have ability to express
>>>>>> something like:
>>>>>>
>>>>>> INSERT INTO … IF NOT EXISTS AND
>>>>>> UPDATE … IF EXISTS AND
>>>>>> UPDATE … IF colX = ‘xyz’
>>>>>>
>>>>>> where each statement refers to a row living potentially on different
>>>>>> set of nodes.
>>>>>> In yet another words: imagine batch with conditions, but which allows
>>>>>> to specify multiple statements with conditions for rows in different
>>>>>> partitions.
>>>>>>
>>>>>> Do you find it very useful, moderately useful or you don’t need
that
>>>>>> because you have some superb business logic handling of such cases
for
>>>>>> example?
>>>>>>
>>>>>> Let me know.
>>>>>> Regards,
>>>>>> Marek
>>>>>
>>>>>
>>>>>
>>>
>>
>>
>> --
>> Marek Lewandowski
>>
>
>

Mime
View raw message