Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E2D371744E for ; Fri, 24 Apr 2015 13:08:39 +0000 (UTC) Received: (qmail 64258 invoked by uid 500); 24 Apr 2015 13:08:39 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 64217 invoked by uid 500); 24 Apr 2015 13:08:39 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 64204 invoked by uid 99); 24 Apr 2015 13:08:39 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Apr 2015 13:08:39 +0000 Date: Fri, 24 Apr 2015 13:08:39 +0000 (UTC) From: "Sylvain Lebresne (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-9231) Support Routing Key as part of Partition Key MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-9231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14510995#comment-14510995 ] Sylvain Lebresne commented on CASSANDRA-9231: --------------------------------------------- bq. I have an equally strong preference to not overcomplicate and overgeneralise this Well, I disagree that it's *over*generalization, it's just generalization, and generalization don't always mean more complex. In fact, it's imo simpler to use functions than to come up with a new custom concept. Perhaps more importantly, I think that something potentially *more* useful than just using one component of the partition key would be to use both component but only use the first one for first half of the token and the 2nd one for the 2nd half. The result being that partitions having the same first component would be on the same replica or some small number of replicas, but with still some scaling properties if you have very man partition having the same first component. > Support Routing Key as part of Partition Key > -------------------------------------------- > > Key: CASSANDRA-9231 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9231 > Project: Cassandra > Issue Type: Wish > Components: Core > Reporter: Matthias Broecheler > Fix For: 3.1 > > > Provide support for sub-dividing the partition key into a routing key and a non-routing key component. Currently, all columns that make up the partition key of the primary key are also routing keys, i.e. they determine which nodes store the data. This proposal would give the data modeler the ability to designate only a subset of the columns that comprise the partition key to be routing keys. The non-routing key columns of the partition key identify the partition but are not used to determine where to store the data. > Consider the following example table definition: > CREATE TABLE foo ( > a int, > b int, > c int, > d int, > PRIMARY KEY (([a], b), c ) ); > (a,b) is the partition key, c is the clustering key, and d is just a column. In addition, the square brackets identify the routing key as column a. This means that only the value of column a is used to determine the node for data placement (i.e. only the value of column a is murmur3 hashed to compute the token). In addition, column b is needed to identify the partition but does not influence the placement. > This has the benefit that all rows with the same routing key (but potentially different non-routing key columns of the partition key) are stored on the same node and that knowledge of such co-locality can be exploited by applications build on top of Cassandra. > Currently, the only way to achieve co-locality is within a partition. However, this approach has the limitations that: a) there are theoretical and (more importantly) practical limitations on the size of a partition and b) rows within a partition are ordered and an index is build to exploit such ordering. For large partitions that overhead is significant if ordering isn't needed. > In other words, routing keys afford a simple means to achieve scalable node-level co-locality without ordering while clustering keys afford page-level co-locality with ordering. As such, they address different co-locality needs giving the data modeler the flexibility to choose what is needed for their application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)