cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Srdjan Mitrovic (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-4775) Counters 2.0
Date Tue, 12 Feb 2013 22:41:14 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13577055#comment-13577055
] 

Srdjan Mitrovic edited comment on CASSANDRA-4775 at 2/12/13 10:39 PM:
----------------------------------------------------------------------

bq. Not sure we'd want to support avg (since it requires extra information to be stored, as
you point out)
If we record every incr operation we will have extra info (until compaction :( )

I will propose a way you can make idempotent counters work and have all these features.
1. Create a CF with columns replayID, counterName, value, cnt and optional columns customField1,
customField2,.... 
(Random partitioner on replayID or if we want to be sure it is unique we can use ComposityType
replayID:counterName
2. Create a secondary index on counterName that we use to find sum(value) on each node separately
because secondary index is distributed. 
3. On compaction we delete old replayID, find total of value*cnt and sum(cnt) and store a
new row (replayId, counterName, total, new cnt)

We can use increment operation with some count (this will affect avg). For example incr(counters,
myCounter, replayId, 3, 5) which will increment counter by 15 but it will be stored as value
3, cnt 5 so that it affects average in a different way than incrementing by value 15, count
1.

We can create custom fields for some reduce(Iterable<Column> so that we can support
min, max, AND/OR/XOR...For examoke on compaction we would store reduced max in that custom
field.

It would be ideal if a secondary index could also store values of the columns so that we can
read counters in one go on each node. There is another jira issue for this. After that issue
is resolved we can only keep secondary index without original CF, we just pretend it exists
:)

I guess that this approach could be achieved by clients if we have a pluggable compaction
strategy but it would still be much easier if secondary indexes could also store other column
values, not only keys.

                
      was (Author: stecak):
    bq. Not sure we'd want to support avg (since it requires extra information to be stored,
as you point out)
If we record every incr operation we will have extra info (until compaction :( )

I will propose a way you can make idempotent counters work and have all these features.
1. Create a CF with columns replayID, counterName, value, cnt and optional columns customField1,
customField2,.... 
(Random partitioner on replayID or if we want to be sure it is unique we can use ComposityType
replayID:counterName
2. Create a secondary index on counterName that we use to find sum(value) on each node separately
because secondary index is distributed. 
3. on compaction we delete old replayID, find total of value*cnt and sum(cnt) and store a
new row (replayId, counterName, total, new cnt)

We can use increment operation with some count (this will affect avg). For example incr(counters,
myCounter, replayId, 3, 5) which will increment counter by 15 but it will be stored as value
3, cnt 5 so that it affects average in a different way than incrementing by value 15, count
1.

We can create custom fields for some reduce(Iterable<Column> so that we can support
min, max, AND/OR/XOR...

It would be ideal if a secondary index could also store values of the columns so that we can
read counters in one go on each node. There is another jira issue for this. After that issue
is resolved we can only keep secondary index without original CF, we just pretend it exists
:)

                  
> Counters 2.0
> ------------
>
>                 Key: CASSANDRA-4775
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4775
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Arya Goudarzi
>              Labels: counters
>             Fix For: 2.0
>
>
> The existing partitioned counters remain a source of frustration for most users almost
two years after being introduced.  The remaining problems are inherent in the design, not
something that can be fixed given enough time/eyeballs.
> Ideally a solution would give us
> - similar performance
> - less special cases in the code
> - potential for a retry mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message