cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Drew Kutcharian <d...@venarc.com>
Subject Data partitioning and composite partition key
Date Fri, 29 Aug 2014 22:48:16 GMT
Hey Guys,

AFAIK, currently Cassandra partitions (thrift) rows using the row key, basically uses the
hash(row_key) to decide what node that row needs to be stored on. Now there are times when
there is a need to shard a wide row, say storing events per sensor, so you’d have sensorId-datetime
row key so you don’t end up with very large rows. Is there a way to have the partitioner
use only the “sensorId” part of the row key for the hash? This way we would be able to
store all the data relating to a sensor in one node.

Another use case of this would be multi-tenancy:

Say we have accounts and accounts have users. So we would have the following tables:

CREATE TABLE account (
  id                     timeuuid PRIMARY KEY,
  company         text      //timezone
);

CREATE TABLE user (
  id              timeuuid PRIMARY KEY, 
  accountId timeuuid,
  email        text,
  password text
);

// Get users by account
CREATE TABLE user_account_index (
  accountId  timeuuid,
  userId        timeuuid,
  PRIMARY KEY(acid, id)
);

Say I want to get all the users that belong to an account. I would first have to get the results
from user_account_index and then use a multi-get (WHERE IN) to get the records from user table.
Now this multi-get part could potentially query a lot of different nodes in the cluster. It’d
be great if there was a way to limit storage of users of an account to a single node so that
way multi-get would only need to query a single node. 

Note that the problem cannot be simply fixed by using (accountId, id) as the primary key for
the user table since that would create a problem of having a very large number of (thrift)
rows in the users table.

I did look thru the code and JIRA and I couldn’t really find a solution. The closest I got
was to have a custom partitioner, but then you can’t have a partitioner per keyspace and
that’s not even something that’d be implemented in future based on the following JIRA:
https://issues.apache.org/jira/browse/CASSANDRA-295

Any ideas are much appreciated.

Best,

Drew
Mime
View raw message