incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Confusion regarding the terms "replica" and "replication factor"
Date Wed, 30 May 2012 20:32:02 GMT
You can avoid the confusion by using the term natural endpoints. For
example, with a replication factor of 3 natural endpoints for key x
are node1, node2, node11.

The snitch does use the datacenter and the rack but almost all
deployments use a single rack per DC, because when you have more then
one rack in a data center the NTS snitch has some logic to spread the
data between racks. (most people do not want this behavior)


On Wed, May 30, 2012 at 3:57 PM, David Fischer <fischer.d.r@gmail.com> wrote:
> Thanks!
>
> My missunderstanding was the snitch names are broken up by DC1:RAC1
> and the strategy_options takes only the first part of the snitch
> names?
>
>
>
> On Wed, May 30, 2012 at 12:14 PM, Jeff Williams
> <jeffw@wherethebitsroam.com> wrote:
>> First, note that replication is done at the row level, not at the node level.
>>
>> That line should look more like:
>>
>> placement_strategy = 'NetworkTopologyStrategy'  and strategy_options = {DC1: 1,DC2:
1,DC3: 1 }
>>
>> This means that each row will have one copy in each DC and within each DC it's placement
will be according to the partitioner, so could be on any of the nodes in the each DC.
>>
>> So, don't think of it as nodes replicating, but rather as how nodes should store
a copy of each row in each DC.
>>
>> Also, replication does not relate the the seed nodes. Seed nodes allow the nodes
to find each other initially, but are not special otherwise - any node can be used as a seed
node.
>>
>> So if you had a strategy like:
>>
>> placement_strategy = 'NetworkTopologyStrategy'  and strategy_options = {DC1: 3,DC2:
2,DC3: 1 }
>>
>> Each row would exist on 3 of 4 nodes in DC1, 2 of 4 nodes in DC2 and on one of the
nodes in DC3. Again, with the placement in each DC due to the partitioner, based on the row
key.
>>
>> Jeff
>>
>> On May 29, 2012, at 11:25 PM, David Fischer wrote:
>>
>>> Ok now i am confused :),
>>>
>>> ok if i have the following
>>> placement_strategy = 'NetworkTopologyStrategy'  and strategy_options =
>>> {DC1:R1,DC2:R1,DC3:R1 }
>>>
>>> this means in each of my datacenters i will have one full replica that
>>> also can be seed node?
>>> if i have 3 node in addition to the DC replica's with normal token
>>> calculations a key can be in any datacenter plus on each of the
>>> replicas right?
>>> It will show 12 nodes total in its ring
>>>
>>> On Thu, May 24, 2012 at 2:39 AM, aaron morton <aaron@thelastpickle.com>
wrote:
>>>> This is partly historical. NTS (as it is now) has not always existed and
was not always the default. In days gone by used to be a fella could run a mighty fine key-value
store using just a Simple Replication Strategy.
>>>>
>>>> A different way to visualise it is a single ring with a Z axis for the DC's.
When you look at the ring from the top you can see all the nodes. When you look at it from
the side you can see the nodes are on levels that correspond to their DC. Simple Strategy
looks at the ring from the top. NTS works through the layers of the ring.
>>>>
>>>>> If the hierarchy is Cluster ->
>>>>> DataCenter -> Node, why exactly do we need globally unique node tokens
>>>>> even though nodes are at the lowest level in the hierarchy.
>>>> Nodes having a DC is a feature of *some* snitches and utilised by the *some*
of the replication strategies (and by the messaging system for network efficiency). For background,
mapping from row tokens to nodes is based on http://en.wikipedia.org/wiki/Consistent_hashing
>>>>
>>>> Hope that helps.
>>>> -----------------
>>>> Aaron Morton
>>>> Freelance Developer
>>>> @aaronmorton
>>>> http://www.thelastpickle.com
>>>>
>>>> On 24/05/2012, at 1:07 AM, java jalwa wrote:
>>>>
>>>>> Thanks Aaron. That makes things clear.
>>>>> So I guess the 0 - 2^127 range for tokens corresponds to a cluster
>>>>> -level top-level ring. and then you add some logic on top of that with
>>>>> NTS to logically segment that range into sub-rings as per the notion
>>>>> of data clusters defined in NTS. Whats the advantage of having a
>>>>> single top-level ring ? intuitively it seems like each replication
>>>>> group could have a separate ring so that the same tokens can be
>>>>> assigned to nodes in different DC. If the hierarchy is Cluster ->
>>>>> DataCenter -> Node, why exactly do we need globally unique node tokens
>>>>> even though nodes are at the lowest level in the hierarchy.
>>>>>
>>>>> Thanks again.
>>>>>
>>>>>
>>>>> On Wed, May 23, 2012 at 3:14 AM, aaron morton <aaron@thelastpickle.com>
wrote:
>>>>>>> Now if a row key hash is mapped to a range owned by a node in
DC3,
>>>>>>> will the Node in DC3 still store the key as determined by the
>>>>>>> partitioner and then walk the ring and store 2 replicas each
in DC1
>>>>>>> and DC2 ?
>>>>>> No, only nodes in the DC's specified in the NTS configuration will
be replicas.
>>>>>>
>>>>>>> Or will the co-ordinator node be aware of the
>>>>>>> replica placement strategy,
>>>>>>> and override the partitioner's decision and walk the ring until
it
>>>>>>> first encounters a node in DC1 or DC2 ? and then place the remaining
>>>>>>> replicas ?
>>>>>> The NTS considers each DC to have it's own ring. This can make token
selection in a multi DC environment confusing at times. There is something in the DS docs
about it.
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>> -----------------
>>>>>> Aaron Morton
>>>>>> Freelance Developer
>>>>>> @aaronmorton
>>>>>> http://www.thelastpickle.com
>>>>>>
>>>>>> On 23/05/2012, at 3:16 PM, java jalwa wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>              I am a bit confused regarding the terms "replica"
and
>>>>>>> "replication factor". Assume that I am using RandomPartitioner
and
>>>>>>> NetworkTopologyStrategy for replica placement.
>>>>>>> From what I understand, with a RandomPartitioner, a row key will
>>>>>>> always be hashed and be stored on the node that owns the range
to
>>>>>>> which the key is mapped.
>>>>>>> http://www.datastax.com/docs/1.0/cluster_architecture/replication#networktopologystrategy.
>>>>>>> The example here, talks about having 2 data centers and a replication
>>>>>>> factor of 4 with 2 replicas in each datacenter, so the strategy
is
>>>>>>> configured as DC1:2 and DC2:2. Now suppose I add another datacenter
>>>>>>> DC3, and do not change the NetworkTopologyStrategy.
>>>>>>> Now if a row key hash is mapped to a range owned by a node in
DC3,
>>>>>>> will the Node in DC3 still store the key as determined by the
>>>>>>> partitioner and then walk the ring and store 2 replicas each
in DC1
>>>>>>> and DC2 ? Will that mean that I will then have 5 replicas in
the
>>>>>>> cluster and not 4 ? Or will the co-ordinator node be aware of
the
>>>>>>> replica placement strategy,
>>>>>>> and override the partitioner's decision and walk the ring until
it
>>>>>>> first encounters a node in DC1 or DC2 ? and then place the remaining
>>>>>>> replicas ?
>>>>>>>
>>>>>>> Thanks.
>>>>>>
>>>>
>>

Mime
View raw message