incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manu Zhang <owenzhang1...@gmail.com>
Subject Re: why set replica placement strategy at keyspace level ?
Date Tue, 29 Jan 2013 23:33:05 GMT
On Tue 29 Jan 2013 03:39:17 PM CST, aaron morton wrote:
>
>>   So If I write to CF Users with rowkey="dean"
>> and to CF Schedules with rowkey="dean", it is actually one row?
> In my mental model that's correct.
> A RowMutation is a row key and a collection of (internal) ColumnFamilies which contain
the columns to write for a single CF.
>
> This is the thing that is committed to the log, and then the changes in the ColumnFamilies
are applied to each CF in an isolated way.
>
>> .(must have missed that several times in the
>> documentation).
> http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 29/01/2013, at 9:28 AM, "Hiller, Dean" <Dean.Hiller@nrel.gov> wrote:
>
>> "If you write to 4 CF's with the same row key that is considered one
>> mutation"
>>
>> Hmmmmm, I never considered this, never knew either.(very un-intuitive from
>> a user perspective IMHO).  So If I write to CF Users with rowkey="dean"
>> and to CF Schedules with rowkey="dean", it is actually one row?  (it's so
>> un-intuitive that I had to ask to make sure I am reading that correctly).
>>
>> I guess I really don't have that case since most of my row keys are GUID's
>> anyways, but very interesting and unexpected (not sure I really mind, was
>> just taken aback)
>>
>> Ps. Not sure I ever minded losting atomic commits to the same row across
>> CF's as I never expected it in the first place having used cassandra for
>> more than a year.(must have missed that several times in the
>> documentation).
>>
>> Thanks,
>> Dean
>>
>> On 1/28/13 12:41 PM, "aaron morton" <aaron@thelastpickle.com> wrote:
>>
>>>>
>>>> Another thing that's been confusing me is that when we talk about the
>>>> data model should the row key be inside or outside a column family?
>>> My mental model is:
>>>
>>> cluster == database
>>> keyspace == table
>>> row == a row in a table
>>> CF == a family of columns in one row
>>>
>>> (I think that's different to others, but it works for me)
>>>
>>>> Is it important to store rows of different column families that share
>>>> the same row key to the same node?
>>> Makes the failure models a little easier to understand. e.g. Everything
>>> key for user "amorton" is either available or not.
>>>
>>>> Meanwhile, what's the drawback of setting RPS and RF at column family
>>>> level?
>>> Other than it's baked in?
>>>
>>> We process all mutations for a row at the same time. If you write to 4
>>> CF's with the same row key that is considered one mutation, for one row.
>>> That one RowMutation is directed to the replicas using the
>>> ReplicationStratagy and atomically applied to the commit log.
>>>
>>> If you have RS per CF that one mutation would be split into 4, which
>>> would then be sent to different replicas. Even if they went to the same
>>> replicas they would be written to the commit log as different mutations.
>>>
>>> So if you have RS per CF you lose atomic commits for writes to the same
>>> row.
>>>
>>> Cheers
>>>
>>> -----------------
>>> Aaron Morton
>>> Freelance Cassandra Developer
>>> New Zealand
>>>
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 28/01/2013, at 11:22 PM, Manu Zhang <owenzhang1990@gmail.com> wrote:
>>>
>>>> On Mon 28 Jan 2013 04:42:49 PM CST, aaron morton wrote:
>>>>> The row is the unit of replication, all values with the same storage
>>>>> engine row key in a KS are on the same nodes. if they were per CF this
>>>>> would not hold.
>>>>>
>>>>> Not that it would be the end of the world, but that is the first thing
>>>>> that comes to mind.
>>>>>
>>>>> Cheers
>>>>> -----------------
>>>>> Aaron Morton
>>>>> Freelance Cassandra Developer
>>>>> New Zealand
>>>>>
>>>>> @aaronmorton
>>>>> http://www.thelastpickle.com
>>>>>
>>>>> On 27/01/2013, at 4:15 PM, Manu Zhang <owenzhang1990@gmail.com>
wrote:
>>>>>
>>>>>> Although I've got to know Cassandra for quite a while, this question
>>>>>> only has occurred to me recently:
>>>>>>
>>>>>> Why are the replica placement strategy and replica factors set at
the
>>>>>> keyspace level?
>>>>>>
>>>>>> Would setting them at the column family level offers more flexibility?
>>>>>>
>>>>>> Is this because it's easier for user to manage an application? Or
>>>>>> related to internal implementation? Or it's just that I've overlooked
>>>>>> something?
>>>>>
>>>>
>>>> Is it important to store rows of different column families that share
>>>> the same row key to the same node? AFAIK, Cassandra doesn't support get
>>>> all of them in a single call.
>>>>
>>>> Meanwhile, what's the drawback of setting RPS and RF at column family
>>>> level?
>>>>
>>>> Another thing that's been confusing me is that when we talk about the
>>>> data model should the row key be inside or outside a column family?
>>>>
>>>> Thanks
>>>>
>>>
>>
>

 From that wiki page, "mutations against a single key are atomic but not 
isolated". I think a row mutation is isolated now, but is it across 
column families? By the way, the wiki page really needs updating.

Mime
View raw message