cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrian Cockcroft <>
Subject Re: Direct control over where data is stored?
Date Sun, 05 Jun 2011 06:07:37 GMT
Sounds like Khanh thinks he can do joins... :-)

User oriented data is easy, key by facebook id, let cassandra handle
location. Set replication factor=3 so you don't lose data and can do
consistent but slower read after write when you need to using quorum.
If you are running on AWS you should distribute your replicas over
availability zones.

Then you can do read A, read B join them in your app code. Single
digit milliseconds for each read or write.

If you want to do bulk operations over many users, use Brisk with a Hadoop job.


On Sat, Jun 4, 2011 at 9:32 PM, Maki Watanabe <> wrote:
> You may be able to do it with the Order Preserving Partitioner with
> making key to node mapping before storing data, or you may need your
> custom Partitioner. Please note that you are responsible to distribute
> load between nodes in this case.
> From application design perspective, it is not clear for me why you
> need to store user A and his friends into same box....
> maki
> 2011/6/5 Khanh Nguyen <>:
>> Hi everyone,
>> Is it possible to have direct control over where objects are stored in
>> Cassandra? For example, I have a Cassandra cluster of 4 machines and 4
>> objects A, B, C, D; I want to store A at machine 1, B at machine 2, C
>> at machine 3 and D at machine 4. My guess is that I need to intervene
>> they way Cassandra hashes an object into the keyspace? If so, how
>> complicated the task will be?
>> I'm new to the list and Cassandra. The reason I am asking is that my
>> current project is related to social locality of data: if A and B are
>> Facebook friends, I want to store their data as close as possible,
>> preferably in the same machine in a cluster.
>> Thank you.
>> Regards,
>> -k
> --
> w3m

View raw message