cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-9231) Support Routing Key as part of Partition Key
Date Fri, 08 May 2015 19:23:01 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535326#comment-14535326
] 

Benedict edited comment on CASSANDRA-9231 at 5/8/15 7:22 PM:
-------------------------------------------------------------

bq. They wouldn't be providing arbitrary tokens, they would be providing arbitrary input to
the hash function (for Random, MP3).

{code}
CREATE FUNCTION myOrderedTokenFct(a bigint) RETURNS bigint AS 'return a';
CREATE TABLE t (
   a int PRIMARY KEY,
   b text,
   c text
) with tokenizer=myOrderedTokenFct;
{code}
 
bq. Basically, this gets you very close to a per-table partitioner. The actual partitioner
would just define the "domain" of the tokens and how they sort, but the actual computation
would be per-table. And this for very, very little change to the syntax and barely more complexity
code-wise than the "routing key" idea.

It looks to me like these two statements disagree, but I may be mistaken.


was (Author: benedict):
bq. They wouldn't be providing arbitrary tokens, they would be providing arbitrary input to
the hash function (for Random, MP3).

{code}
CREATE FUNCTION myOrderedTokenFct(a bigint) RETURNS bigint AS 'return a';
CREATE TABLE t (
   a int PRIMARY KEY,
   b text,
   c text
) with tokenizer=myOrderedTokenFct;
{code}
 
bq. Basically, this gets you very close to a per-table partitioner. The actual partitioner
would just define the "domain" of the tokens and how they sort, but the actual computation
would be per-table. And this for very, very little change to the syntax and barely more complexity
code-wise than the "routing key" idea.



> Support Routing Key as part of Partition Key
> --------------------------------------------
>
>                 Key: CASSANDRA-9231
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9231
>             Project: Cassandra
>          Issue Type: Wish
>          Components: Core
>            Reporter: Matthias Broecheler
>             Fix For: 3.x
>
>
> Provide support for sub-dividing the partition key into a routing key and a non-routing
key component. Currently, all columns that make up the partition key of the primary key are
also routing keys, i.e. they determine which nodes store the data. This proposal would give
the data modeler the ability to designate only a subset of the columns that comprise the partition
key to be routing keys. The non-routing key columns of the partition key identify the partition
but are not used to determine where to store the data.
> Consider the following example table definition:
> CREATE TABLE foo (
>   a int,
>   b int,
>   c int,
>   d int,
>   PRIMARY KEY  (([a], b), c ) );
> (a,b) is the partition key, c is the clustering key, and d is just a column. In addition,
the square brackets identify the routing key as column a. This means that only the value of
column a is used to determine the node for data placement (i.e. only the value of column a
is murmur3 hashed to compute the token). In addition, column b is needed to identify the partition
but does not influence the placement.
> This has the benefit that all rows with the same routing key (but potentially different
non-routing key columns of the partition key) are stored on the same node and that knowledge
of such co-locality can be exploited by applications build on top of Cassandra.
> Currently, the only way to achieve co-locality is within a partition. However, this approach
has the limitations that: a) there are theoretical and (more importantly) practical limitations
on the size of a partition and b) rows within a partition are ordered and an index is build
to exploit such ordering. For large partitions that overhead is significant if ordering isn't
needed.
> In other words, routing keys afford a simple means to achieve scalable node-level co-locality
without ordering while clustering keys afford page-level co-locality with ordering. As such,
they address different co-locality needs giving the data modeler the flexibility to choose
what is needed for their application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message