cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Yeschenko (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8878) Counter Tables should be more clearly identified
Date Fri, 06 Mar 2015 22:59:39 GMT


Aleksey Yeschenko commented on CASSANDRA-8878:

I think there are 3 separate issues here:

1. Should we re-allow counter deletions? The reason we "stopped allowing them" was a pure
arithmetic bug in 1.1 (we forgot to update {{ColumnFamily#addCounter()}} to use microseconds,
but did so for {{addTombstone()}} and friends).

Later we made that behavior official in CASSANDRA-7346, which I'm not sure now was the right
thing to do. In 3.1, CASSANDRA-6506 is coming. It will make counters, underneath, just a host_id
-> subcounter map, getting rid of both the concepts of {{CounterCell}} ({{Cell#isCounterCell()}}
in post-8099 world) and BB counter contexts, thus making the mid to low storage engine neat,
generic, and flat. Doing that would also allow us to get rid of the counters cache entirely
- the read before write will be able to through the equivalent of the current {{CollactionController#collectTimeOrderedData()}}.
The only special case on the read path would be CASSANDRA-7346 - and for that reason alone
I want it reverted.

And, generally, some form deletes would still be better than no deletes. Currently users who
need counter deletion and reuse must set gc gs to very low values as a workaround, which isn't
any more reliable than the form of counters deletion we had pre-1.1.

2. Assuming we agree on (1), should we allow counter columns to co-exist with regular columns?

With the shared read path and delete behavior, there is nothing stopping us from doing that.
Reading a counter would become equivalent to reading a map, deleting a counter would be equivalent
to doing a whole-map deletion. This would give people who otherwise need to keep two separate
tables proper data locality on reads, which would be a big win for some.

3. Assuming we agree on (1) and (2), should we allow mixing counters and non-counters in the
same UPDATEs/batches?

Having (1) and (2) doesn't mean that we'll also need to support mixing counter and non-counter
updates. Those write-paths are indeed very different, and I don't see how to work around it.
We could emit separate mutations and apply them separately (split the update into a {{Mutation}}
and a {{CounterMutation}}). Or just reject those mixed writes. If it ever comes to (3), my
vote goes to the latter. Based on the {{WriteTimeoutException}}'s {{writeType}} the driver
will be able to decide to retry or not, just like it does now.

TL;DR: we can re-allow counters deletion, and allow mixing counters and regular columns in
the same tables - we can unify the read paths in 3.1/3.2. We'll still forbid mixing counter
and non-counter writes. Overall I think it'd be a nice thing - less special casing in the
storage engine, giving users locality and making it easier to use Cassandra idiomatically
(serve an action with a single table read).

> Counter Tables should be more clearly identified
> ------------------------------------------------
>                 Key: CASSANDRA-8878
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Michaël Figuière
>            Assignee: Aleksey Yeschenko
>            Priority: Minor
>             Fix For: 3.0
> Counter tables are internally considered as a particular kind of table, different from
the regular ones. This counter specific nature is implicitly defined by the fact that columns
within a table have the {{counter}} data type. This nature turns out to be persistent over
the time, that is if the user do the following:
> {code}
> CREATE TABLE counttable (key uuid primary key, count counter);
> ALTER TABLE counttable DROP count;
> ALTER TABLE counttable ADD count2 int;
> {code} 
> The following error will be thrown:
> {code}
> Cannot add a non counter column (count2) in a counter column family
> {code}
> Even if the table doesn't have any counter column anymore. This implicit, persistent
nature can be challenging to understand for users (and impossible to infer in the case above).
For this reason a more explicit declaration of counter tables would be appropriate, as:
> {code}
> CREATE COUNTER TABLE counttable (key uuid primary key, count counter);
> {code}
> Besides that, adding a boolean {{counter_table}} column in the {{system.schema_columnfamilies}}
table would allow external tools to easily differentiate a counter table from a regular one.

This message was sent by Atlassian JIRA

View raw message