cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: How to come up with a predefined topology
Date Thu, 12 Jul 2012 22:34:51 GMT
> WIll it also use the
> snitch/strategy info to find next 'R' replicas 'closest' to
> coordinator-node ?
yes. 

> 2. In a single DC ( with n racks and r replicas ) what algorithm
The logic is here
https://github.com/apache/cassandra/blob/cassandra-1.1/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L78

> a. n>r : I am assuming, have 1 replica in each rack.
You have 1 replica in the first n racks. 

> b. n<r : ?? I am assuming, try to equally distribute replicas across
> in each racks.
int(n/r) racks will have the same number of replicas. n % r will have more. 

This is why multi rack replication can be tricky. 

Hope that helps. 


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 12/07/2012, at 8:05 PM, prasenjit mukherjee wrote:

> Thanks. Some follow up questions :
> 
> 1.  How do the reads use strategy/snitch information ? I am assuming
> the reads can go to any of the replicas. WIll it also use the
> snitch/strategy info to find next 'R' replicas 'closest' to
> coordinator-node ?
> 
> 2. In a single DC ( with n racks and r replicas ) what algorithm
> cassandra uses to write its replicas in following scenarios :
> a. n>r : I am assuming, have 1 replica in each rack.
> b. n<r : ?? I am assuming, try to equally distribute replicas across
> in each racks.
> 
> -Thanks,
> Prasenjit
> 
> On Thu, Jul 12, 2012 at 11:24 AM, Tyler Hobbs <tyler@datastax.com> wrote:
>> I highly recommend specifying the same rack for all nodes (using
>> cassandra-topology.properties) unless you really have a good reason not too
>> (and you probably don't).  The way that replicas are chosen when multiple
>> racks are in play can be fairly confusing and lead to a data imbalance if
>> you don't catch it.
>> 
>> 
>> On Wed, Jul 11, 2012 at 10:53 PM, prasenjit mukherjee <prasen.bea@gmail.com>
>> wrote:
>>> 
>>>> As far as I know there isn't any way to use the rack name in the
>>>> strategy_options for a keyspace. You
>>>> might want to look at the code to dig into that, perhaps.
>>> 
>>> Aha, I was wondering if I could do that as well ( specify rack options )
>>> :)
>>> 
>>> Thanks for the pointer, I will dig into the code.
>>> 
>>> -Thanks,
>>> Prasenjit
>>> 
>>> On Thu, Jul 12, 2012 at 5:33 AM, Richard Lowe <richard.lowe@arkivum.com>
>>> wrote:
>>>> If you then specify the parameters for the keyspace to use these, you
>>>> can control exactly which set of nodes replicas end up on.
>>>> 
>>>> For example, in cassandra-cli:
>>>> 
>>>> create keyspace ks1 with placement_strategy =
>>>> 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options
>>>> = { DC1_realtime: 2, DC1_analytics: 1, DC2_realtime: 1 };
>>>> 
>>>> As far as I know there isn't any way to use the rack name in the
>>>> strategy_options for a keyspace. You might want to look at the code to dig
>>>> into that, perhaps.
>>>> 
>>>> Whichever snitch you use, the nodes are sorted in order of proximity to
>>>> the client node. How this is determined depends on the snitch that's used
>>>> but most (the ones that ship with Cassandra) will use the default ordering
>>>> of same-node < same-rack < same-datacenter < different-datacenter.
Each
>>>> snitch has methods to tell Cassandra which rack and DC a node is in, so it
>>>> always knows which node is closest. Used with the Bloom filters this can
>>>> tell us where the nearest replica is.
>>>> 
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: prasenjit mukherjee [mailto:prasen.bea@gmail.com]
>>>> Sent: 11 July 2012 06:33
>>>> To: user
>>>> Subject: How to come up with a predefined topology
>>>> 
>>>> Quoting from
>>>> http://www.datastax.com/docs/0.8/cluster_architecture/replication#networktopologystrategy
>>>> :
>>>> 
>>>> "Asymmetrical replication groupings are also possible depending on your
>>>> use case. For example, you may want to have three replicas per data center
>>>> to serve real-time application requests, and then have a single replica in
a
>>>> separate data center designated to running analytics."
>>>> 
>>>> Have 2 questions :
>>>> 1. Any example how to configure a topology with 3 replicas in one DC (
>>>> with 2 in 1 rack + 1 in another rack ) and one replica in another DC ?
>>>> The default networktopologystrategy with rackinferringsnitch will only
>>>> give me equal distribution ( 2+2 )
>>>> 
>>>> 2. I am assuming the reads can go to any of the replicas. Is there a
>>>> client which will send query to a node ( in cassandra ring ) which is
>>>> closest to the client ?
>>>> 
>>>> -Thanks,
>>>> Prasenjit
>>>> 
>>>> 
>> 
>> 
>> 
>> 
>> --
>> Tyler Hobbs
>> DataStax
>> 


Mime
View raw message