cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: why set replica placement strategy at keyspace level ?
Date Mon, 28 Jan 2013 19:41:23 GMT
> 
> Another thing that's been confusing me is that when we talk about the data model should
the row key be inside or outside a column family?
My mental model is:

cluster == database
keyspace == table
row == a row in a table
CF == a family of columns in one row

(I think that's different to others, but it works for me)

> Is it important to store rows of different column families that share the same row key
to the same node?
Makes the failure models a little easier to understand. e.g. Everything key for user "amorton"
is either available or not. 

> Meanwhile, what's the drawback of setting RPS and RF at column family level?
Other than it's baked in?

We process all mutations for a row at the same time. If you write to 4 CF's with the same
row key that is considered one mutation, for one row. That one RowMutation is directed to
the replicas using the ReplicationStratagy and atomically applied to the commit log. 

If you have RS per CF that one mutation would be split into 4, which would then be sent to
different replicas. Even if they went to the same replicas they would be written to the commit
log as different mutations. 

So if you have RS per CF you lose atomic commits for writes to the same row.

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 28/01/2013, at 11:22 PM, Manu Zhang <owenzhang1990@gmail.com> wrote:

> On Mon 28 Jan 2013 04:42:49 PM CST, aaron morton wrote:
>> The row is the unit of replication, all values with the same storage engine row key
in a KS are on the same nodes. if they were per CF this would not hold.
>> 
>> Not that it would be the end of the world, but that is the first thing that comes
to mind.
>> 
>> Cheers
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 27/01/2013, at 4:15 PM, Manu Zhang <owenzhang1990@gmail.com> wrote:
>> 
>>> Although I've got to know Cassandra for quite a while, this question only has
occurred to me recently:
>>> 
>>> Why are the replica placement strategy and replica factors set at the keyspace
level?
>>> 
>>> Would setting them at the column family level offers more flexibility?
>>> 
>>> Is this because it's easier for user to manage an application? Or related to
internal implementation? Or it's just that I've overlooked something?
>> 
> 
> Is it important to store rows of different column families that share the same row key
to the same node? AFAIK, Cassandra doesn't support get all of them in a single call.
> 
> Meanwhile, what's the drawback of setting RPS and RF at column family level?
> 
> Another thing that's been confusing me is that when we talk about the data model should
the row key be inside or outside a column family?
> 
> Thanks
> 


Mime
View raw message