incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David G. Boney" <dbon...@semanticartifacts.com>
Subject Partitions
Date Fri, 24 Dec 2010 19:07:15 GMT
I am using the Hadoop interface with Cassandra. Is it possible to line up partitions or splits
of two different column families to be on the same node? I am doing this for data locality
reasons. I want to read all the data from a split of column family A and a split from column
family B into memory to do some processing.

Here is an example. Column family A has 1,000,000 rows and column family B has 50,000,000
rows. Let say column family A has a split every 10,000 rows and column family B has a split
every 500,000 rows. I want the first split of A and the first split of B on same node and
the second split of A and second split of B on the next node, and so on. 

A second scenario is that the two column families use the same key. Lets assume the key is
an integer in the range of 1 to 1,000,000. The two column families have a different number
of rows. I would like the splits to occur at certain multiples of the key value, say every
10,000. The first split would have keys in the range of 1 to 9999. The second split would
have keys in the range of 10,000 to 19,999 and so on. I still want the first split of column
family A and the first split of column family B to be on the first node, and so on. It is
possible in this scenario that a split could be empty or very small, that is OK.
-------------
Sincerely,
David G. Boney
dboney1@semanticartifacts.com
http://www.semanticartifacts.com





Mime
View raw message