incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manu Zhang <owenzhang1...@gmail.com>
Subject Re: why set replica placement strategy at keyspace level ?
Date Thu, 31 Jan 2013 02:29:01 GMT
On Thu 31 Jan 2013 08:55:40 AM CST, aaron morton wrote:
>>   I think a row mutation is isolated now, but is it across column families?
> Correct they are isolated, but only for an individual CF.
>
>> By the way, the wiki page really needs updating.
> You can update if you would like to.
>
> Cheers
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 30/01/2013, at 12:33 PM, Manu Zhang <owenzhang1990@gmail.com> wrote:
>
>> On Tue 29 Jan 2013 03:39:17 PM CST, aaron morton wrote:
>>>
>>>>   So If I write to CF Users with rowkey="dean"
>>>> and to CF Schedules with rowkey="dean", it is actually one row?
>>> In my mental model that's correct.
>>> A RowMutation is a row key and a collection of (internal) ColumnFamilies which
contain the columns to write for a single CF.
>>>
>>> This is the thing that is committed to the log, and then the changes in the ColumnFamilies
are applied to each CF in an isolated way.
>>>
>>>> .(must have missed that several times in the
>>>> documentation).
>>> http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic
>>>
>>> Cheers
>>>
>>> -----------------
>>> Aaron Morton
>>> Freelance Cassandra Developer
>>> New Zealand
>>>
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 29/01/2013, at 9:28 AM, "Hiller, Dean" <Dean.Hiller@nrel.gov> wrote:
>>>
>>>> "If you write to 4 CF's with the same row key that is considered one
>>>> mutation"
>>>>
>>>> Hmmmmm, I never considered this, never knew either.(very un-intuitive from
>>>> a user perspective IMHO).  So If I write to CF Users with rowkey="dean"
>>>> and to CF Schedules with rowkey="dean", it is actually one row?  (it's so
>>>> un-intuitive that I had to ask to make sure I am reading that correctly).
>>>>
>>>> I guess I really don't have that case since most of my row keys are GUID's
>>>> anyways, but very interesting and unexpected (not sure I really mind, was
>>>> just taken aback)
>>>>
>>>> Ps. Not sure I ever minded losting atomic commits to the same row across
>>>> CF's as I never expected it in the first place having used cassandra for
>>>> more than a year.(must have missed that several times in the
>>>> documentation).
>>>>
>>>> Thanks,
>>>> Dean
>>>>
>>>> On 1/28/13 12:41 PM, "aaron morton" <aaron@thelastpickle.com> wrote:
>>>>
>>>>>>
>>>>>> Another thing that's been confusing me is that when we talk about
the
>>>>>> data model should the row key be inside or outside a column family?
>>>>> My mental model is:
>>>>>
>>>>> cluster == database
>>>>> keyspace == table
>>>>> row == a row in a table
>>>>> CF == a family of columns in one row
>>>>>
>>>>> (I think that's different to others, but it works for me)
>>>>>
>>>>>> Is it important to store rows of different column families that share
>>>>>> the same row key to the same node?
>>>>> Makes the failure models a little easier to understand. e.g. Everything
>>>>> key for user "amorton" is either available or not.
>>>>>
>>>>>> Meanwhile, what's the drawback of setting RPS and RF at column family
>>>>>> level?
>>>>> Other than it's baked in?
>>>>>
>>>>> We process all mutations for a row at the same time. If you write to
4
>>>>> CF's with the same row key that is considered one mutation, for one row.
>>>>> That one RowMutation is directed to the replicas using the
>>>>> ReplicationStratagy and atomically applied to the commit log.
>>>>>
>>>>> If you have RS per CF that one mutation would be split into 4, which
>>>>> would then be sent to different replicas. Even if they went to the same
>>>>> replicas they would be written to the commit log as different mutations.
>>>>>
>>>>> So if you have RS per CF you lose atomic commits for writes to the same
>>>>> row.
>>>>>
>>>>> Cheers
>>>>>
>>>>> -----------------
>>>>> Aaron Morton
>>>>> Freelance Cassandra Developer
>>>>> New Zealand
>>>>>
>>>>> @aaronmorton
>>>>> http://www.thelastpickle.com
>>>>>
>>>>> On 28/01/2013, at 11:22 PM, Manu Zhang <owenzhang1990@gmail.com>
wrote:
>>>>>
>>>>>> On Mon 28 Jan 2013 04:42:49 PM CST, aaron morton wrote:
>>>>>>> The row is the unit of replication, all values with the same
storage
>>>>>>> engine row key in a KS are on the same nodes. if they were per
CF this
>>>>>>> would not hold.
>>>>>>>
>>>>>>> Not that it would be the end of the world, but that is the first
thing
>>>>>>> that comes to mind.
>>>>>>>
>>>>>>> Cheers
>>>>>>> -----------------
>>>>>>> Aaron Morton
>>>>>>> Freelance Cassandra Developer
>>>>>>> New Zealand
>>>>>>>
>>>>>>> @aaronmorton
>>>>>>> http://www.thelastpickle.com
>>>>>>>
>>>>>>> On 27/01/2013, at 4:15 PM, Manu Zhang <owenzhang1990@gmail.com>
wrote:
>>>>>>>
>>>>>>>> Although I've got to know Cassandra for quite a while, this
question
>>>>>>>> only has occurred to me recently:
>>>>>>>>
>>>>>>>> Why are the replica placement strategy and replica factors
set at the
>>>>>>>> keyspace level?
>>>>>>>>
>>>>>>>> Would setting them at the column family level offers more
flexibility?
>>>>>>>>
>>>>>>>> Is this because it's easier for user to manage an application?
Or
>>>>>>>> related to internal implementation? Or it's just that I've
overlooked
>>>>>>>> something?
>>>>>>>
>>>>>>
>>>>>> Is it important to store rows of different column families that share
>>>>>> the same row key to the same node? AFAIK, Cassandra doesn't support
get
>>>>>> all of them in a single call.
>>>>>>
>>>>>> Meanwhile, what's the drawback of setting RPS and RF at column family
>>>>>> level?
>>>>>>
>>>>>> Another thing that's been confusing me is that when we talk about
the
>>>>>> data model should the row key be inside or outside a column family?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>
>>>>
>>>
>>
>>  From that wiki page, "mutations against a single key are atomic but not isolated".
I think a row mutation is isolated now, but is it across column families? By the way, the
wiki page really needs updating.
>
 Aaron, your mental data model looks like that of HBase, but 
Cassandra's data model is different from HBase's, right?
Although we apply RowMutations across Column Families, do we typically 
read from multiple Column Families?
The thing bothers me is that when I have to tell others about 
Cassandra's data model.

Mime
View raw message