cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Petrov (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-11000) Mixing LWT and non-LWT operations can result in an LWT operation being acknowledged but not applied
Date Mon, 18 Apr 2016 12:46:25 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-11000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alex Petrov updated CASSANDRA-11000:
------------------------------------
    Description: 
When mixing light-weight transaction (LWT, a.k.a. compare-and-set, conditional update) operations
with regular operations, it can happen that an LWT operation is acknowledged (applied = True),
even though the update has not been applied and a SELECT operation still returns the old data.

For example, consider the following table:

{code}
CREATE TABLE test (
    pk text,
    ck text,
    v text,
    PRIMARY KEY (pk, ck)
);
{code}

We start with an empty table and insert data using a regular (non-LWT) operation:

{code}
INSERT INTO test (pk, ck, v) VALUES ('foo', 'bar', '123');
{code}
A following SELECT statement returns the data as expected. Now we do a conditional update
(LWT):

{code}
UPDATE test SET v = '456' WHERE pk = 'foo' AND ck = 'bar' IF v = '123';
{code}

As expected, the update is applied and a following SELECT statement shows the updated value.

Now we do the same but use a time stamp that is slightly in the future (e.g. a few seconds)
for the INSERT statement (obviously $time$ needs to be replaced by a time stamp that is slightly
ahead of the system clock).

{code}
INSERT INTO test (pk, ck, v) VALUES ('foo', 'bar', '123') USING TIMESTAMP $time$;
{code}

Now, running the same UPDATE statement still report success (applied = True). However, a subsequent
SELECT yields the old value ('123') instead of the updated value ('456'). Inspecting the time
stamp of the value indicates that it has not been replaced (the value from the original INSERT
is still in place).

{code}
This behavior is exhibited in an single-node cluster running Cassandra 2.1.11, 2.2.4, and
3.0.1.
{code}

Testing this for a multi-node cluster is a bit more tricky, so I only tested it with Cassandra
2.2.4. Here, I made one of the nodes lack behind in time for a few seconds (using libfaketime).
I used a replication factor of three for the test keyspace. In this case, the behavior can
be demonstrated even without using an explicitly specified time stamp. Running

{code}
INSERT INTO test (pk, ck, v) VALUES ('foo', 'bar', '123');
{code}

on a node with the regular clock followed by

{code}
UPDATE test SET v = '456' WHERE pk = 'foo' AND ck = 'bar' IF v = '123';
{code}

on the node lagging behind results in the UPDATE to report success, but the old value still
being used.

Interestingly, everything works as expected if using LWT operations consistently: When running

{code}
UPDATE test SET v = '456' WHERE pk = 'foo' AND ck = 'bar' IF v = '123';
UPDATE test SET v = '123' WHERE pk = 'foo' AND ck = 'bar' IF v = '456';
{code}

in an alternating fashion on two nodes (one with a "normal" clock, one with the clock lagging
behind), the updates are applied as expected. When checking the time stamps ("{{SELECT WRITETIME(v)
FROM test;}}"), one can see that the time stamp is increased by just a single tick when the
statement is executed on the node lagging behind.

I think that this problem is strongly related to (or maybe even the same as) the one described
in CASSANDRA-7801, even though CASSANDRA-7801 was mainly concerned about a single-node cluster.
However, the fact that this problem still exists in current versions of Cassandra makes me
suspect that either it is a different problem or the original problem was not fixed completely
with the patch from CASSANDRA-7801.

I found CASSANDRA-9655 which suggest removing the changes introduced with CASSANDRA-7801 because
they can be problematic under certain circumstances, but I am not sure whether this is the
right place to discuss the issue I am experiencing. If you feel so, feel free to close this
issue and update the description of CASSANDRA-9655.

In my opinion, the best way to fix this problem would be ensuring that a write that is part
of a LWT always uses a time stamp that is at least one tick greater than the time stamp of
the existing data. As the existing data has to be read for checking the condition anyway,
I do not think that this would cause an additional overhead. If this is not possible, I suggest
to look into whether we can somehow detect such a situation and at least report failure (applied
= False) on the LWT instead of reporting success.

The latter solution would at least fix those cases where code checks the success of a LWT
before performing any further actions (e.g. because the LWT is used to take some kind of lock).
Currently, the code will assume that the operation was successful (and thus - staying in the
example - it owns the lock), while other processes running in parallel will see a different
state. It is my understanding that LWTs were designed to avoid exactly this situation, but
at the moment the assumptions most users will make about LWTs do not always hold.

Until this issue is solved, I suggest at least updating the CQL documentation and clearly
stating that LWTs / conditional updates are not safe if data has been previously INSERTed
/ UPDATEd / DELETEd using non-LWT operations and there is a clock skew or time stamps that
are in the future have been supplied explicitly. This should at least save some users from
making wrong assumptions about LWTs and not realizing it until their application fails in
an unsafe way.


  was:
When mixing light-weight transaction (LWT, a.k.a. compare-and-set, conditional update) operations
with regular operations, it can happen that an LWT operation is acknowledged (applied = True),
even though the update has not been applied and a SELECT operation still returns the old data.

For example, consider the following table:

CREATE TABLE test (
    pk text,
    ck text,
    v text,
    PRIMARY KEY (pk, ck)
);

We start with an empty table and insert data using a regular (non-LWT) operation:

INSERT INTO test (pk, ck, v) VALUES ('foo', 'bar', '123');

A following SELECT statement returns the data as expected. Now we do a conditional update
(LWT):

UPDATE test SET v = '456' WHERE pk = 'foo' AND ck = 'bar' IF v = '123';

As expected, the update is applied and a following SELECT statement shows the updated value.

Now we do the same but use a time stamp that is slightly in the future (e.g. a few seconds)
for the INSERT statement (obviously $time$ needs to be replaced by a time stamp that is slightly
ahead of the system clock).

INSERT INTO test (pk, ck, v) VALUES ('foo', 'bar', '123') USING TIMESTAMP $time$;

Now, running the same UPDATE statement still report success (applied = True). However, a subsequent
SELECT yields the old value ('123') instead of the updated value ('456'). Inspecting the time
stamp of the value indicates that it has not been replaced (the value from the original INSERT
is still in place).

This behavior is exhibited in an single-node cluster running Cassandra 2.1.11, 2.2.4, and
3.0.1.

Testing this for a multi-node cluster is a bit more tricky, so I only tested it with Cassandra
2.2.4. Here, I made one of the nodes lack behind in time for a few seconds (using libfaketime).
I used a replication factor of three for the test keyspace. In this case, the behavior can
be demonstrated even without using an explicitly specified time stamp. Running

INSERT INTO test (pk, ck, v) VALUES ('foo', 'bar', '123');

on a node with the regular clock followed by

UPDATE test SET v = '456' WHERE pk = 'foo' AND ck = 'bar' IF v = '123';

on the node lagging behind results in the UPDATE to report success, but the old value still
being used.

Interestingly, everything works as expected if using LWT operations consistently: When running

UPDATE test SET v = '456' WHERE pk = 'foo' AND ck = 'bar' IF v = '123';
UPDATE test SET v = '123' WHERE pk = 'foo' AND ck = 'bar' IF v = '456';

in an alternating fashion on two nodes (one with a "normal" clock, one with the clock lagging
behind), the updates are applied as expected. When checking the time stamps ("SELECT WRITETIME(v)
FROM test;"), one can see that the time stamp is increased by just a single tick when the
statement is executed on the node lagging behind.

I think that this problem is strongly related to (or maybe even the same as) the one described
in CASSANDRA-7801, even though CASSANDRA-7801 was mainly concerned about a single-node cluster.
However, the fact that this problem still exists in current versions of Cassandra makes me
suspect that either it is a different problem or the original problem was not fixed completely
with the patch from CASSANDRA-7801.

I found CASSANDRA-9655 which suggest removing the changes introduced with CASSANDRA-7801 because
they can be problematic under certain circumstances, but I am not sure whether this is the
right place to discuss the issue I am experiencing. If you feel so, feel free to close this
issue and update the description of CASSANDRA-9655.

In my opinion, the best way to fix this problem would be ensuring that a write that is part
of a LWT always uses a time stamp that is at least one tick greater than the time stamp of
the existing data. As the existing data has to be read for checking the condition anyway,
I do not think that this would cause an additional overhead. If this is not possible, I suggest
to look into whether we can somehow detect such a situation and at least report failure (applied
= False) on the LWT instead of reporting success.

The latter solution would at least fix those cases where code checks the success of a LWT
before performing any further actions (e.g. because the LWT is used to take some kind of lock).
Currently, the code will assume that the operation was successful (and thus - staying in the
example - it owns the lock), while other processes running in parallel will see a different
state. It is my understanding that LWTs were designed to avoid exactly this situation, but
at the moment the assumptions most users will make about LWTs do not always hold.

Until this issue is solved, I suggest at least updating the CQL documentation and clearly
stating that LWTs / conditional updates are not safe if data has been previously INSERTed
/ UPDATEd / DELETEd using non-LWT operations and there is a clock skew or time stamps that
are in the future have been supplied explicitly. This should at least save some users from
making wrong assumptions about LWTs and not realizing it until their application fails in
an unsafe way.



> Mixing LWT and non-LWT operations can result in an LWT operation being acknowledged but
not applied
> ---------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-11000
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11000
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Coordination
>         Environment: Cassandra 2.1, 2.2, and 3.0 on Linux and OS X.
>            Reporter: Sebastian Marsching
>
> When mixing light-weight transaction (LWT, a.k.a. compare-and-set, conditional update)
operations with regular operations, it can happen that an LWT operation is acknowledged (applied
= True), even though the update has not been applied and a SELECT operation still returns
the old data.
> For example, consider the following table:
> {code}
> CREATE TABLE test (
>     pk text,
>     ck text,
>     v text,
>     PRIMARY KEY (pk, ck)
> );
> {code}
> We start with an empty table and insert data using a regular (non-LWT) operation:
> {code}
> INSERT INTO test (pk, ck, v) VALUES ('foo', 'bar', '123');
> {code}
> A following SELECT statement returns the data as expected. Now we do a conditional update
(LWT):
> {code}
> UPDATE test SET v = '456' WHERE pk = 'foo' AND ck = 'bar' IF v = '123';
> {code}
> As expected, the update is applied and a following SELECT statement shows the updated
value.
> Now we do the same but use a time stamp that is slightly in the future (e.g. a few seconds)
for the INSERT statement (obviously $time$ needs to be replaced by a time stamp that is slightly
ahead of the system clock).
> {code}
> INSERT INTO test (pk, ck, v) VALUES ('foo', 'bar', '123') USING TIMESTAMP $time$;
> {code}
> Now, running the same UPDATE statement still report success (applied = True). However,
a subsequent SELECT yields the old value ('123') instead of the updated value ('456'). Inspecting
the time stamp of the value indicates that it has not been replaced (the value from the original
INSERT is still in place).
> {code}
> This behavior is exhibited in an single-node cluster running Cassandra 2.1.11, 2.2.4,
and 3.0.1.
> {code}
> Testing this for a multi-node cluster is a bit more tricky, so I only tested it with
Cassandra 2.2.4. Here, I made one of the nodes lack behind in time for a few seconds (using
libfaketime). I used a replication factor of three for the test keyspace. In this case, the
behavior can be demonstrated even without using an explicitly specified time stamp. Running
> {code}
> INSERT INTO test (pk, ck, v) VALUES ('foo', 'bar', '123');
> {code}
> on a node with the regular clock followed by
> {code}
> UPDATE test SET v = '456' WHERE pk = 'foo' AND ck = 'bar' IF v = '123';
> {code}
> on the node lagging behind results in the UPDATE to report success, but the old value
still being used.
> Interestingly, everything works as expected if using LWT operations consistently: When
running
> {code}
> UPDATE test SET v = '456' WHERE pk = 'foo' AND ck = 'bar' IF v = '123';
> UPDATE test SET v = '123' WHERE pk = 'foo' AND ck = 'bar' IF v = '456';
> {code}
> in an alternating fashion on two nodes (one with a "normal" clock, one with the clock
lagging behind), the updates are applied as expected. When checking the time stamps ("{{SELECT
WRITETIME(v) FROM test;}}"), one can see that the time stamp is increased by just a single
tick when the statement is executed on the node lagging behind.
> I think that this problem is strongly related to (or maybe even the same as) the one
described in CASSANDRA-7801, even though CASSANDRA-7801 was mainly concerned about a single-node
cluster. However, the fact that this problem still exists in current versions of Cassandra
makes me suspect that either it is a different problem or the original problem was not fixed
completely with the patch from CASSANDRA-7801.
> I found CASSANDRA-9655 which suggest removing the changes introduced with CASSANDRA-7801
because they can be problematic under certain circumstances, but I am not sure whether this
is the right place to discuss the issue I am experiencing. If you feel so, feel free to close
this issue and update the description of CASSANDRA-9655.
> In my opinion, the best way to fix this problem would be ensuring that a write that is
part of a LWT always uses a time stamp that is at least one tick greater than the time stamp
of the existing data. As the existing data has to be read for checking the condition anyway,
I do not think that this would cause an additional overhead. If this is not possible, I suggest
to look into whether we can somehow detect such a situation and at least report failure (applied
= False) on the LWT instead of reporting success.
> The latter solution would at least fix those cases where code checks the success of a
LWT before performing any further actions (e.g. because the LWT is used to take some kind
of lock). Currently, the code will assume that the operation was successful (and thus - staying
in the example - it owns the lock), while other processes running in parallel will see a different
state. It is my understanding that LWTs were designed to avoid exactly this situation, but
at the moment the assumptions most users will make about LWTs do not always hold.
> Until this issue is solved, I suggest at least updating the CQL documentation and clearly
stating that LWTs / conditional updates are not safe if data has been previously INSERTed
/ UPDATEd / DELETEd using non-LWT operations and there is a clock skew or time stamps that
are in the future have been supplied explicitly. This should at least save some users from
making wrong assumptions about LWTs and not realizing it until their application fails in
an unsafe way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message