incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From onlinespending <onlinespend...@gmail.com>
Subject Exactly one wide row per node for a given CF?
Date Tue, 03 Dec 2013 05:09:13 GMT
Subject says it all. I want to be able to randomly distribute a large set of records but keep
them clustered in one wide row per node.

As an example, lets say I’ve got a collection of about 1 million records each with a unique
id. If I just go ahead and set the primary key (and therefore the partition key) as the unique
id, I’ll get very good random distribution across my server cluster. However, each record
will be its own row. I’d like to have each record belong to one large wide row (per server
node) so I can have them sorted or clustered on some other column.

If I say have 5 nodes in my cluster, I could randomly assign a value of 1 - 5 at the time
of creation and have the partition key set to this value. But this becomes troublesome if
I add or remove nodes. What effectively I want is to partition on the unique id of the record
modulus N (id % N; where N is the number of nodes).

I have to imagine there’s a mechanism in Cassandra to simply randomize the partitioning
without even using a key (and then clustering on some column).

Thanks for any help.
Mime
View raw message