cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Khanh Nguyen <nguyen.h.kh...@gmail.com>
Subject Re: Direct control over where data is stored?
Date Sun, 05 Jun 2011 16:18:09 GMT
Hi Maki and Adrian,

Thank you very much for the promptness. It's weekend after all :).

I realized I forgot a part of my question until Adrian mentioned the
replication factor. Is it also possible to set where the replicas are
stored as well? Thanks.

This is a research experiment we're exploring with socially-related
data. If we want to pull data of A and B out of Cassandra, (i.e
LastNameColumn['A'], and LastNameColumn['B'), it should be faster if
these values are stored in the same box than if one is stored at a box
in NY and another, Tokyo, no?

Regards,

-k


On Sun, Jun 5, 2011 at 2:07 AM, Adrian Cockcroft
<adrian.cockcroft@gmail.com> wrote:
> Sounds like Khanh thinks he can do joins... :-)
>
> User oriented data is easy, key by facebook id, let cassandra handle
> location. Set replication factor=3 so you don't lose data and can do
> consistent but slower read after write when you need to using quorum.
> If you are running on AWS you should distribute your replicas over
> availability zones.
>
> Then you can do read A, read B join them in your app code. Single
> digit milliseconds for each read or write.
>
> If you want to do bulk operations over many users, use Brisk with a Hadoop job.
>
> HTH
> Adrian
>
> On Sat, Jun 4, 2011 at 9:32 PM, Maki Watanabe <watanabe.maki@gmail.com> wrote:
>> You may be able to do it with the Order Preserving Partitioner with
>> making key to node mapping before storing data, or you may need your
>> custom Partitioner. Please note that you are responsible to distribute
>> load between nodes in this case.
>> From application design perspective, it is not clear for me why you
>> need to store user A and his friends into same box....
>>
>> maki
>>
>>
>> 2011/6/5 Khanh Nguyen <nguyen.h.khanh@gmail.com>:
>>> Hi everyone,
>>>
>>> Is it possible to have direct control over where objects are stored in
>>> Cassandra? For example, I have a Cassandra cluster of 4 machines and 4
>>> objects A, B, C, D; I want to store A at machine 1, B at machine 2, C
>>> at machine 3 and D at machine 4. My guess is that I need to intervene
>>> they way Cassandra hashes an object into the keyspace? If so, how
>>> complicated the task will be?
>>>
>>> I'm new to the list and Cassandra. The reason I am asking is that my
>>> current project is related to social locality of data: if A and B are
>>> Facebook friends, I want to store their data as close as possible,
>>> preferably in the same machine in a cluster.
>>>
>>> Thank you.
>>>
>>> Regards,
>>>
>>> -k
>>>
>>
>>
>>
>> --
>> w3m
>>
>

Mime
View raw message