Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CDCE411529 for ; Sat, 30 Aug 2014 00:23:42 +0000 (UTC) Received: (qmail 49252 invoked by uid 500); 30 Aug 2014 00:23:34 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 49214 invoked by uid 500); 30 Aug 2014 00:23:34 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 49202 invoked by uid 99); 30 Aug 2014 00:23:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 30 Aug 2014 00:23:34 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of SRS0=FFWQ5C=5Y=basetechnology.com=jack@yourhostingaccount.com designates 65.254.254.75 as permitted sender) Received: from [65.254.254.75] (HELO walmailout08.yourhostingaccount.com) (65.254.254.75) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 30 Aug 2014 00:23:08 +0000 Received: from mailscan06.yourhostingaccount.com ([10.1.15.6] helo=walmailscan06.yourhostingaccount.com) by walmailout08.yourhostingaccount.com with esmtp (Exim) id 1XNWRf-00068u-6a for user@cassandra.apache.org; Fri, 29 Aug 2014 20:23:07 -0400 Received: from [10.114.3.33] (helo=walimpout13) by walmailscan06.yourhostingaccount.com with esmtp (Exim) id 1XNWRe-0005TC-Po for user@cassandra.apache.org; Fri, 29 Aug 2014 20:23:06 -0400 Received: from walauthsmtp04.yourhostingaccount.com ([10.1.18.4]) by walimpout13 with id koP31o00105G96J01oP6WB; Fri, 29 Aug 2014 20:23:06 -0400 X-Authority-Analysis: v=2.1 cv=M9jDKkAs c=1 sm=1 tr=0 a=ZyCNx9LFiA0kwLx3ZJIN5w==:117 a=UkMH5KcvGpXfM81wB0t8ug==:17 a=pq4jwCggAAAA:8 a=OF-CdTOGAAAA:8 a=aQzbgH187woA:10 a=fyi80_0-YdYA:10 a=_U-P9LBhFlMA:10 a=3jZET7lWBKwA:10 a=jvYhGVW7AAAA:8 a=OA2lqS22AAAA:8 a=r77TgQKjGQsHNAKrUKIA:9 a=9iDbn-4jx3cA:10 a=cKsnjEOsciEA:10 a=mV9VRH-2AAAA:8 a=5GG0SCeFAAAA:8 a=bkkWBRgJir6voHi38sEA:9 a=pILNOxqGKmIA:10 a=EMlJoiak7gQA:10 a=aSotVmNKAAAA:8 a=5eUDXoQCYVIcmE9dzx8A:9 a=G2m1SPfCUe0D6rmc:21 a=_W_S_7VecoQA:10 a=yl1UuOG3v2AA:10 Received: from 207-237-113-28.c3-0.nyr-ubr1.nyr.ny.cable.rcn.com ([207.237.113.28]:33286 helo=JackKrupansky14) by walauthsmtp04.yourhostingaccount.com with esmtpa (Exim) id 1XNWRb-0003Ue-4r for user@cassandra.apache.org; Fri, 29 Aug 2014 20:23:03 -0400 Message-ID: <787B79DEF3BB4984873824622933161F@JackKrupansky14> From: "Jack Krupansky" To: References: <94971CB5-9CE3-4119-A788-8CF1D4797EB5@venarc.com> In-Reply-To: Subject: Re: Data partitioning and composite partition key Date: Fri, 29 Aug 2014 20:23:02 -0400 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0619_01CFC3C7.083430A0" X-Priority: 3 X-MSMail-Priority: Normal Importance: Normal X-Mailer: Microsoft Windows Live Mail 16.4.3528.331 X-MimeOLE: Produced By Microsoft MimeOLE V16.4.3528.331 X-EN-UserInfo: e0a4b55451ed9f27313ebf02e3d4348d:931c98230c6409dcc37fa7e93b490c27 X-EN-AuthUser: jack@basetechnology.com Sender: "Jack Krupansky" X-EN-OrigIP: 207.237.113.28 X-EN-OrigHost: 207-237-113-28.c3-0.nyr-ubr1.nyr.ny.cable.rcn.com X-Virus-Checked: Checked by ClamAV on apache.org This is a multi-part message in MIME format. ------=_NextPart_000_0619_01CFC3C7.083430A0 Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Okay, but what benefit do you think you get from having the partitions = on the same node =96 since they would be separate partitions anyway? I = mean, what exactly do you think you=92re going to do with them, that = wouldn=92t be a whole lot more performant by being able to process data = in parallel from separate nodes? I mean, the whole point of Cassandra is = scalability and distributed processing, right? -- Jack Krupansky From: Drew Kutcharian=20 Sent: Friday, August 29, 2014 7:31 PM To: user@cassandra.apache.org=20 Subject: Re: Data partitioning and composite partition key Hi Jack,=20 I think you missed the point of my email which was trying to avoid the = problem of having very wide rows :) In the notation of = sensorId-datatime, the datatime is a datetime bucket, say a day. The CQL = rows would still be keyed by the actual time of the event. So you=92d = end up having SesonId->Datetime Bucket (day/week/month)->actual event. = What I wanted to be able to do was to colocate all the events related to = a sensor id on a single node (token). See "High Throughput Timelines=94 at = http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra - Drew On Aug 29, 2014, at 3:58 PM, Jack Krupansky = wrote: With CQL3, you, the developer, get to decide whether to place a = primary key column in the partition key or as a clustering column. So, = make sensorID the partition key and datetime as a clustering column. -- Jack Krupansky From: Drew Kutcharian=20 Sent: Friday, August 29, 2014 6:48 PM To: user@cassandra.apache.org=20 Subject: Data partitioning and composite partition key Hey Guys,=20 AFAIK, currently Cassandra partitions (thrift) rows using the row key, = basically uses the hash(row_key) to decide what node that row needs to = be stored on. Now there are times when there is a need to shard a wide = row, say storing events per sensor, so you=92d have sensorId-datetime = row key so you don=92t end up with very large rows. Is there a way to = have the partitioner use only the =93sensorId=94 part of the row key for = the hash? This way we would be able to store all the data relating to a = sensor in one node. Another use case of this would be multi-tenancy: Say we have accounts and accounts have users. So we would have the = following tables: CREATE TABLE account ( id timeuuid PRIMARY KEY, company text //timezone ); CREATE TABLE user ( id timeuuid PRIMARY KEY,=20 accountId timeuuid, email text, password text ); // Get users by account CREATE TABLE user_account_index ( accountId timeuuid, userId timeuuid, PRIMARY KEY(acid, id) ); Say I want to get all the users that belong to an account. I would = first have to get the results from user_account_index and then use a = multi-get (WHERE IN) to get the records from user table. Now this = multi-get part could potentially query a lot of different nodes in the = cluster. It=92d be great if there was a way to limit storage of users of = an account to a single node so that way multi-get would only need to = query a single node.=20 Note that the problem cannot be simply fixed by using (accountId, id) = as the primary key for the user table since that would create a problem = of having a very large number of (thrift) rows in the users table. I did look thru the code and JIRA and I couldn=92t really find a = solution. The closest I got was to have a custom partitioner, but then = you can=92t have a partitioner per keyspace and that=92s not even = something that=92d be implemented in future based on the following JIRA: https://issues.apache.org/jira/browse/CASSANDRA-295 Any ideas are much appreciated. Best, Drew ------=_NextPart_000_0619_01CFC3C7.083430A0 Content-Type: text/html; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable
Okay, but what benefit do you think you get from having the = partitions on=20 the same node =96 since they would be separate partitions anyway? I = mean, what=20 exactly do you think you=92re going to do with them, that wouldn=92t be = a whole lot=20 more performant by being able to process data in parallel from separate = nodes? I=20 mean, the whole point of Cassandra is scalability and distributed = processing,=20 right?
 
-- Jack=20 Krupansky
 
Sent: Friday, August 29, 2014 7:31 PM
Subject: Re: Data partitioning and composite partition=20 key
 
Hi=20 Jack,=20
 
I think you missed the point of my email which was trying to avoid = the=20 problem of having very wide rows :)  In the notation of = sensorId-datatime,=20 the datatime is a datetime bucket, say a day. The CQL rows would still = be keyed=20 by the actual time of the event. So you=92d end up having = SesonId->Datetime=20 Bucket (day/week/month)->actual event. What I wanted to be able to do = was to=20 colocate all the events related to a sensor id on a single node = (token).
 
See "High Throughput Timelines=94 at http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra=
 
- Drew
 
 
On Aug 29, 2014, at 3:58 PM, Jack Krupansky <jack@basetechnology.com>=20 wrote:

With CQL3, you, the developer, get to decide whether to place a = primary=20 key column in the partition key or as a clustering column. So, make = sensorID=20 the partition key and datetime as a clustering column.
 
-- Jack = Krupansky
 
Sent: Friday, August 29, 2014 6:48 PM
Subject: Data partitioning and composite partition=20 key
 
Hey=20 Guys,=20
 
AFAIK, currently Cassandra partitions (thrift) rows using the row = key,=20 basically uses the hash(row_key) to decide what node that row needs to = be=20 stored on. Now there are times when there is a need to shard a wide = row, say=20 storing events per sensor, so you=92d have sensorId-datetime row key = so you=20 don=92t end up with very large rows. Is there a way to have the = partitioner use=20 only the =93sensorId=94 part of the row key for the hash? This way we = would be=20 able to store all the data relating to a sensor in one node.
 
Another use case of this would be multi-tenancy:
 
Say we have accounts and accounts have users. So we would have = the=20 following tables:
 
CREATE TABLE account (
 =20 = id            = ;        =20 timeuuid PRIMARY KEY,
  company        =20 text      //timezone
);
 
CREATE TABLE user (
 =20 = id            = ; =20 timeuuid PRIMARY KEY,
  accountId timeuuid,
  email        = text,
  password text
);
 
// Get users by account
CREATE TABLE user_account_index (
  accountId  timeuuid,
  userId        = timeuuid,
  PRIMARY KEY(acid, id)
);
 
Say I want to get all the users that belong to an account. I = would first=20 have to get the results from user_account_index and then use a = multi-get=20 (WHERE IN) to get the records from user table. Now this multi-get part = could=20 potentially query a lot of different nodes in the cluster. It=92d be = great if=20 there was a way to limit storage of users of an account to a single = node so=20 that way multi-get would only need to query a single node.
 
Note that the problem cannot be simply fixed by using (accountId, = id) as=20 the primary key for the user table since that would create a problem = of having=20 a very large number of (thrift) rows in the users table.
 
I did look thru the code and JIRA and I couldn=92t really find a = solution.=20 The closest I got was to have a custom partitioner, but then you = can=92t have a=20 partitioner per keyspace and that=92s not even something that=92d be = implemented=20 in future based on the following JIRA:
https://issu= es.apache.org/jira/browse/CASSANDRA-295
 
Any ideas are much appreciated.
 
Best,
 
Drew
 
------=_NextPart_000_0619_01CFC3C7.083430A0--