cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6668) Inconsistent handling of row expiration using TTL in collections
Date Tue, 25 Feb 2014 09:23:20 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911416#comment-13911416
] 

Sylvain Lebresne commented on CASSANDRA-6668:
---------------------------------------------

To explain it without getting into implementation, the current semantic of an UPDATE is that
it implicitly set every columns of the PRIMARY KEY (marking the presence of the PK column
is, after all, the main (and nowadays only) reason for the row marker). So that
{noformat}
UPDATE X WHERE id=11;
{noformat}
will always set the column {{id}} to 11, whatever {{X}} does and so
{noformat}
update ttl_issue set collection = collection - {'test_1000'} where id=11;
{noformat}
sets {{id}} without TTL, hence the end result.

Now to be honest, in hindsight, I'm not sure it's the most intuitive behavior possible, if
only because that's not very explicit in the syntax (note that I'm well aware of the historical
reasons why things works the way it is, I'm just trying to take a step back on semantic).
I think it would be more intuitive for UPDATE to only set the columns in the SET clause, because
that's what makes the most sense imo. I.e. technically speaking, we would not insert the row
marker for UPDATE (we would for INSERT however).

That being said, changing that now would of course be a breaking change and we should probably
just stick to the current semantic. So anyway, it occured to me that it's one point where
the semantic is probably not too intuitive and we might at least make sure we proper document
it.

bq. we should probably reject TTL = 0

I'd rather not. We've use 0 for no ttl since the beginning of ttls and I don't think it's
much of a problem. I did pushed a quick update to the CQL doc because arguably it wasn't properly
documented, but I don't think it warrant rejection. Especially since it's not at all impossible
that it could break users (I'm not suggesting anyone would use "TTL 0" in a query string,
but it's perfectly possible that someone uses a prepared statement with a bind marker for
the TTL, sometimes binding a strictly positive TTL and sometimes binding 0 to get no expiration).


bq. Second, we might want to leave the row marker alone (not overwrite it) for DELETE and
equivalent UPDATE queries.

DELETE never really do anything special with the row marker.

For UPDATE, well, I don't know. I'm usually a fan of limiting the number of special case the
semantic has. As said above, my preferred semantic (all notion of backward compatibility aside)
would be to just never insert a row marker in UPDATE. Short of that, the current semantic
of "UPDATE also implicitly set every columns of the PK", while less intuitive, has at least
the merit of being simple, consistent and easy to explain. Adding special cases to that, typically
"... except not if the operation is intrinsically a delete", would make it more complex but
I'm not sure it would make it more intuitive. It would also be breaking strictly speaking,
and if we're willing to break the semantic so it get more intuitive, I think I'd prefer going
all the way to my "preferred" semantic above.




> Inconsistent handling of row expiration using TTL in collections
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-6668
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6668
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Apache Cassandra 2.0.3
> Apache Cassandra 1.2.8
> CQLSH client 3.1.6
>            Reporter: DOAN DuyHai
>            Priority: Critical
>
> The expiration of row when all TTLed columns have expired is inconsistent
> Scenario 1)
> {code:sql}
> cqlsh:test> create table ttl_issue(id int primary key,collection set<text>);
> cqlsh:test> update ttl_issue USING TTL 2 set collection = collection + {'test_2'}
where id=10;
> cqlsh:test> update ttl_issue USING TTL 3 set collection = collection + {'test_3'}
where id=10;
> cqlsh:test> select * from ttl_issue;
>  id | collection
> ----+----------------------
>  10 | {'test_2', 'test_3'}
> cqlsh:test> select * from ttl_issue;
>  id | collection
> ----+----------------------
>  10 | {'test_2', 'test_3'}
> cqlsh:test> select * from ttl_issue;
>  id | collection
> ----+------------
>  10 | {'test_3'}
> cqlsh:test> select * from ttl_issue;
> cqlsh:test> 
> {code}
>  As we can see, after a few seconds, both columns of the collection are expired. When
all columns of the set have expired, the SELECT * FROM ttl_issue *returns no result, meaning
that the whole row has expired.*
> Scenario 2)
> {code:sql}
> cqlsh:test> update ttl_issue USING TTL 3 set collection = collection + {'test_3'}
where id=11;
> cqlsh:test> update ttl_issue USING TTL 1000 set collection = collection + {'test_1000'}
where id=11;
> cqlsh:test> update ttl_issue set collection = collection - {'test_1000'} where id=11;
> cqlsh:test> select * from ttl_issue;
>  id | collection
> ----+------------
>  11 | {'test_3'}
> cqlsh:test> select * from ttl_issue;
>  id | collection
> ----+------------
>  11 | {'test_3'}
> cqlsh:test> select * from ttl_issue;
>  id | collection
> ----+------------
>  11 | {'test_3'}
> cqlsh:test> select * from ttl_issue;
>  id | collection
> ----+------------
>  11 |       null
> {code}
>  In this second scenario. We add elements to the collection with TTL but then remove
one of them. *After a while, although all TTLed columns have expired, the row is till there
with only the primary key present.*
>  One should expect to get the same behavior as in scenario 1), e.g. the complete row
should expire.
>  I've also tried removing one element from collection using TTL 0 ({code:sql}update ttl_issue
USING TTL 0 set collection = collection - {'test_1000'} where id=11;{code})  but the result
is the same.
>  Quick guest: bug on row deletion marker for specific collection element append/remove
?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message