cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CASSANDRA-2103) expiring counter columns
Date Fri, 04 Feb 2011 11:24:28 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990529#comment-12990529
] 

Sylvain Lebresne commented on CASSANDRA-2103:
---------------------------------------------

bq. I think your above case mixes two orthogonal uses for ttls. Typically, a column will only
have ttls and they'll be updated in a fixed period w/in the ttl, which is very useful for
our case. Mixing a ttl and non-ttl use w/in the same column is bound to produce probs.

The problem has nothing to do with mixing ttl with non-ttl. But on that, a non-ttled column
is after all just a ttled column with an arbitrary long ttl, so that there is a problem with
mixing ttl with non-ttl should be a strong hint that something is fishy.

The problem is that when you send an increment with a ttl, you can't know what will be the
actual lifetime of this increment. If the CounterColumn corresponding to this increment is
never merged to another, more recent, CounterColumn, then its lifetime is the ttl. But if
it is merged (during it's lifetime) then it's lifetime is extended to the ttl of the new column.
All well and good except, and that is the problem, you just don't know when a column will
be merged. It is not because a new update have been *issued* during the lifetime of a preceding
one that those two updates will be *merged* during the lifetime of the preceding one. It all
depends on when compactions will kick in (which in turns is random from the point of view
of the client).

Let's take an example. Say you have a counter column family and the only update you ever do
to this CF is increment by 1 with a ttl of 1 week. So only column with ttl. The idea being
to have counters that reset themselves if not incremented for a week (that is the only thing
that would to make sense to me). Now say that for one of the counter in this CF, you happen
to increment it regularly every given hour. After x days, you expect the value of this counter
to be 24 * x (if you disagree on this, you really have to explain me what you'd expect here).

I guarantee you that it is not what you will get. Maybe for some days it will look like it
works, because every new insert will get merged in time to the old ones, extending the ttl
of the whole count. But someday (which depends on seemingly random things like the load of
the CF, the memtable thresholds and other compaction thresholds), a compaction will create
a sstable containing some amount of the total count that is big enough that it won't get compacted
for a week. For a week, you will still get the expected result. But after that specific week,
you will (definitively) lose that part of the counter that hasn't been compacted for a week.
From then on you won't get the expected value.

This patch can't work because its observed behavior depends on when compactions will be done
which, from a client point of view, is a random event.


> expiring counter columns
> ------------------------
>
>                 Key: CASSANDRA-2103
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2103
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.8
>            Reporter: Kelvin Kakugawa
>            Assignee: Kelvin Kakugawa
>             Fix For: 0.8
>
>         Attachments: 0001-CASSANDRA-2103-expiring-counters-logic-tests.patch
>
>
> add ttl functionality to counter columns.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message