cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9231) Support Routing Key as part of Partition Key
Date Fri, 08 May 2015 12:44:01 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14534435#comment-14534435
] 

Sylvain Lebresne commented on CASSANDRA-9231:
---------------------------------------------

bq. the partition key distributes the data both within and without the node, whereas the routing
key only without

I honestly don't understand what that sentence means, *especially* in term of modeling (the
concept of distribution within a node sounds an aweful lot like getting into implementation
details). I know I'm not very smart, but let's say I'm still not sold about the whole simplicity
of how to explain the concept.

bq. There are also two things that seem to be conflated in your proposal: per table partitioners,
and arbitrary functions as partitioners.

I'm not sure why you're trying to find complexity in what I'm suggesting.  Technically, the
routing key idea is just saying that for a specific table, instead of using the "default"
partitioner hash function on the partition key to compute the token, we'll use a function
that first project some part of said partition key and then apply the hash function. It is
using a custom token function, just a super special one. I'm only suggesting we allow any
function instead of just either the default or another very special function. There is nothing
more to do with midpoint calculation, random token creation and whatnot than with the routing
key idea.

I'm an not in any way suggesting per-table partitioners. I don't want to do it ever because
that's a lot of complexity that I'm really not convinced is worth it. What I am saying is
that by allowing generic custom token function (instead of just a syntax for one specific
custom function), we'll likely actually cover most of the use case for per-table partitioner
(probably not all, but most).  And this with virtually no added complexity compared to the
routing key idea.

bq. However we can deliver a lot of the functionality you suggest with just arbitrary function
application to the fields in the partition key when defining the routing key.

That's almost exactly what I'm suggesting, except that by making it just one function on the
whole partition key, it's actually more flexible and you don't have to introduce 2 concepts:
the routing key and then functions on routing key elements.


> Support Routing Key as part of Partition Key
> --------------------------------------------
>
>                 Key: CASSANDRA-9231
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9231
>             Project: Cassandra
>          Issue Type: Wish
>          Components: Core
>            Reporter: Matthias Broecheler
>             Fix For: 3.x
>
>
> Provide support for sub-dividing the partition key into a routing key and a non-routing
key component. Currently, all columns that make up the partition key of the primary key are
also routing keys, i.e. they determine which nodes store the data. This proposal would give
the data modeler the ability to designate only a subset of the columns that comprise the partition
key to be routing keys. The non-routing key columns of the partition key identify the partition
but are not used to determine where to store the data.
> Consider the following example table definition:
> CREATE TABLE foo (
>   a int,
>   b int,
>   c int,
>   d int,
>   PRIMARY KEY  (([a], b), c ) );
> (a,b) is the partition key, c is the clustering key, and d is just a column. In addition,
the square brackets identify the routing key as column a. This means that only the value of
column a is used to determine the node for data placement (i.e. only the value of column a
is murmur3 hashed to compute the token). In addition, column b is needed to identify the partition
but does not influence the placement.
> This has the benefit that all rows with the same routing key (but potentially different
non-routing key columns of the partition key) are stored on the same node and that knowledge
of such co-locality can be exploited by applications build on top of Cassandra.
> Currently, the only way to achieve co-locality is within a partition. However, this approach
has the limitations that: a) there are theoretical and (more importantly) practical limitations
on the size of a partition and b) rows within a partition are ordered and an index is build
to exploit such ordering. For large partitions that overhead is significant if ordering isn't
needed.
> In other words, routing keys afford a simple means to achieve scalable node-level co-locality
without ordering while clustering keys afford page-level co-locality with ordering. As such,
they address different co-locality needs giving the data modeler the flexibility to choose
what is needed for their application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message