cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-5762) Lost row marker after TTL expires
Date Tue, 16 Jul 2013 12:26:49 GMT


Sylvain Lebresne updated CASSANDRA-5762:

    Attachment: 0001-Always-do-slice-queries-for-CQL3-tables.txt

As much as this pains me, I don't see any easy way to make this work outside of doing a read-before-write
(which is not acceptable).

It "might" be possible to make it work (without read-before-write) by specializing the row
marker in the storage so that it tracks TTL to provide the desired behavior but at best that
wouldn't be trivial and would probably make the row marker prohibitive in term of storage
(though something like CASSANDRA-4175 might help make it more reasonable). In any case, it's
_at best_ a solution for 2.1 but not before that, and that's leaving aside the debate of whether
the feature is worth the complexity.

In the meantime, the best workaround I can come with would be to force SELECT queries to slice
the whole CQL3 row even when only some columns are selected.  That is, we would revert to
what we did for selects before CASSANDRA-4361. Tbh, this probably wouldn't have much impact
on performance since 1) CQL3 rows are bound to be relatively small and 2) we now optimize
slice queries relatively well for that kind of case (partly in 1.2 with promoted index and
even more in 2.0 with CASSANDRA-5514) so that queries by names probably don't have that much
benefits anymore.

Doing that would fix the problem is most cases, including the one of the description since
it'll basically relegate the row marker to only mark rows where only the PK is set. This does
not fix it fully though, since if you do
CREATE TABLE test (k int PRIMARY KEY, a int, b int);
INSERT INTO test (k, a, b) VALUES (0, 1, 2);
// wait 2 seconds
then the last select will return no results, even though it kind of should return one result
(with {{a == null}} and {{b == null}}) since we haven't done a full row deletion. But then
we could accept that as a whacky known special situation (don't get me wrong, I don't like
it, it's just that "we have a problem and I don't have a better solution"). And to be fair,
you would really have to try fairly hard to get bitten by this.

Attaching the patch that do what's above for info (IN queries on the last clustering column,
which we support. make that slightly more annoying that one would hope, but it's not too much
of a big deal either).

As mentionned above, another workaround could be to not let user get into that state by forcing
all the (CQL3) columns of the (CQL3) row to be set int the statement if a TTL is used.

The (imho big) problem is that this is a breaking change. If someone is using different TTL
in the same CQL3 row (and his application do depends on it), it basically cannot upgrade (short
of migrating data that have differents TTL into their own separate table, which is extremely
painful). Part of me is also pretty convinced that the convenience of being able to set TTL
to individual columns outweight the "not exactly right" behavior of the special case above
(especially since only people that *needs* per-columns TTL will ever run into that special

> Lost row marker after TTL expires
> ---------------------------------
>                 Key: CASSANDRA-5762
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.2.0
>         Environment: Ubuntu 12.04
>            Reporter: Taner Catakli
>            Assignee: Sylvain Lebresne
>            Priority: Critical
>         Attachments: 0001-Always-do-slice-queries-for-CQL3-tables.txt
> I have the following table
> cqlsh:loginproject> DESCRIBE TABLE gameservers;
> CREATE TABLE gameservers (
>   address inet PRIMARY KEY,
>   last_update timestamp,
>   regions blob,
>   server_status boolean
> ) WITH
>   bloom_filter_fp_chance=0.010000 AND
>   caching='KEYS_ONLY' AND
>   comment='' AND
>   dclocal_read_repair_chance=0.000000 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=0.100000 AND
>   replicate_on_write='true' AND
>   populate_io_cache_on_flush='false' AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'SnappyCompressor'};
> after inserting a row and executing the following command:
> UPDATE gameservers USING TTL 10 SET server_status = true WHERE address = ''
> after waiting for the ttl to expire, the row will lose its rowmarker making "select address
from gameservers" returning 0 results although there are some.
> in cassandra-cli the table looks like this:
> [default@loginproject] list gameservers;
> Using default limit of 100
> Using default cell limit of 100
> -------------------
> RowKey:
> => (name=last_update, value=0000000000000017, timestamp=1373884433543000)
> => (name=regions, value=<truncated>, timestamp=1373883701652000)
> 1 Row Returned.
> Elapsed time: 345 msec(s).
> [default@loginproject]

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message