cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Jirsa (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-6412) Custom creation and merge functions for user-defined column types
Date Wed, 08 Apr 2015 02:33:14 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484355#comment-14484355
] 

Jeff Jirsa edited comment on CASSANDRA-6412 at 4/8/15 2:32 AM:
---------------------------------------------------------------

I'm playing with this, just to understand it conceptually, using CASSANDRA-8099 as a base.

{noformat}
cqlsh> create keyspace test2 WITH replication = {'class': 'SimpleStrategy', 'replication_factor':
2}; use test2;
cqlsh:test2> select column_name, column_resolver from system.schema_columns where keyspace_name='test2'
and columnfamily_name='table_with_resolvers';

 column_name | column_resolver
-------------+------------------------------------------------------------
       first | org.apache.cassandra.db.resolvers.ReverseTimestampResolver
        high |         org.apache.cassandra.db.resolvers.MaxValueResolver
          id |        org.apache.cassandra.db.resolvers.TimestampResolver
        last |        org.apache.cassandra.db.resolvers.TimestampResolver
         low |         org.apache.cassandra.db.resolvers.MinValueResolver

(5 rows)
cqlsh:test2> create table table_with_resolvers ( id text, low int with resolver 'org.apache.cassandra.db.resolvers.MinValueResolver',
high int with resolver 'org.apache.cassandra.db.resolvers.MaxValueResolver', last int with
resolver 'org.apache.cassandra.db.resolvers.TimestampResolver', first int with resolver 'org.apache.cassandra.db.resolvers.ReverseTimestampResolver',
PRIMARY KEY(id));
cqlsh:test2> insert into table_with_resolvers (id, low, high, first, last ) values ('1',
1, 1, 1, 1);                                                                             
                                 
cqlsh:test2> insert into table_with_resolvers (id, low, high, first, last ) values ('1',
2, 2, 2, 2);
cqlsh:test2> insert into table_with_resolvers (id, low, high, first, last ) values ('1',
3, 3, 3, 3);
cqlsh:test2> insert into table_with_resolvers (id, low, high, first, last ) values ('1',
5, 5, 5, 5);
cqlsh:test2> insert into table_with_resolvers (id, low, high, first, last ) values ('1',
4, 4, 4, 4);
cqlsh:test2> select * from table_with_resolvers;

 id | first | high | last | low
----+-------+------+------+-----
  1 |     1 |    5 |    4 |   1

(1 rows)
{noformat}

My diff/patch isn't fit for sharing at this time but as I'm going through, I had some questions:


1) Given that user types are frozen, does it make sense to allow a resolver per field in user
types, assuming that eventually user types will become un-frozen?
2) My initial pass disallows custom resolvers on counters and collections - does anyone have
any strong opinion on whether or not user defined merge functions should be allowed for collections?

3) Given that deletes are not commutative, I'm strongly considering making it so that built-in
resolvers (min, max, first-write-wins, and default last-write-wins) simply always allow tombstones
with a higher timestamp to take priority over anything else with a lower tombstone (that is,
last-write-always-wins with tombstones). That works around SOME of the corner issues involving
deletes - given that these are regular cells and have valid timestamps, does that not address
some of the concern? 



was (Author: jjirsa):
I'm playing with this, just to understand it conceptually, using CASSANDRA-8099 as a base.

{noformat}
cqlsh> create keyspace test2 WITH replication = {'class': 'SimpleStrategy', 'replication_factor':
2}; use test2;
cqlsh:test2> create table table_with_resolvers ( id text, last text with resolver 'org.apache.cassandra.db.resolvers.TimestampResolver',
first text with resolver 'org.apache.cassandra.db.resolvers.ReverseTimestampResolver', PRIMARY
KEY(id));                                                                                
                                                                                         
            
cqlsh:test2> select column_name, column_resolver, type, validator  from system.schema_columns
where keyspace_name='test2' and columnfamily_name='table_with_resolvers';

 column_name | column_resolver                                            | type         
| validator
-------------+------------------------------------------------------------+---------------+------------------------------------------
       first | org.apache.cassandra.db.resolvers.ReverseTimestampResolver |       regular
| org.apache.cassandra.db.marshal.UTF8Type
          id |        org.apache.cassandra.db.resolvers.TimestampResolver | partition_key
| org.apache.cassandra.db.marshal.UTF8Type
        last |        org.apache.cassandra.db.resolvers.TimestampResolver |       regular
| org.apache.cassandra.db.marshal.UTF8Type

cqlsh:test2> insert into table_with_resolvers (id, first, last ) values ('1', '1', '1');
cqlsh:test2> insert into table_with_resolvers (id, first, last ) values ('1', '2', '2');
cqlsh:test2> insert into table_with_resolvers (id, first, last ) values ('1', '3', '3');
cqlsh:test2> select * from table_with_resolvers ;

 id | first | last
----+-------+------
  1 |     1 |    3

(1 rows)
{noformat}

My diff/patch isn't fit for sharing at this time but as I'm going through, I had some questions:


1) Given that user types are frozen, does it make sense to allow a resolver per field in user
types, assuming that eventually user types will become un-frozen?
2) My initial pass disallows custom resolvers on counters and collections - does anyone have
any strong opinion on whether or not user defined merge functions should be allowed for collections?

3) Given that deletes are not commutative, I'm strongly considering making it so that built-in
resolvers (min, max, first-write-wins, and default last-write-wins) simply always allow tombstones
with a higher timestamp to take priority over anything else with a lower tombstone (that is,
last-write-always-wins with tombstones). That works around SOME of the corner issues involving
deletes - given that these are regular cells and have valid timestamps, does that not address
some of the concern? 


> Custom creation and merge functions for user-defined column types
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-6412
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6412
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Nicolas Favre-Felix
>
> This is a proposal for a new feature, mapping custom types to Cassandra columns.
> These types would provide a creation function and a merge function, to be implemented
in Java by the user.
> This feature relates to the concept of CRDTs; the proposal is to replicate "operations"
on these types during write, to apply these operations internally during merge (Column.reconcile),
and to also merge their values on read.
> The following operations are made possible without reading back any data:
> * MIN or MAX(value) for a column
> * First value for a column
> * Count Distinct
> * HyperLogLog
> * Count-Min
> And any composition of these too, e.g. a Candlestick type includes first, last, min,
and max.
> The merge operations exposed by these types need to be commutative; this is the case
for many functions used in analytics.
> This feature is incomplete without some integration with CASSANDRA-4775 (Counters 2.0)
which provides a Read-Modify-Write implementation for distributed counters. Integrating custom
creation and merge functions with new counters would let users implement complex CRDTs in
Cassandra, including:
> * Averages & related (sum of squares, standard deviation)
> * Graphs
> * Sets
> * Custom registers (even with vector clocks)
> I have a working prototype with implementations for min, max, and Candlestick at https://github.com/acunu/cassandra/tree/crdts
- I'd appreciate any feedback on the design and interfaces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message