cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reid Pinchback <>
Subject Re: Cassandra Rack - Datacenter Load Balancing relations
Date Wed, 23 Oct 2019 16:10:35 GMT
Datacenters and racks are different concepts.  While they don't have to be associated with
their historical meanings, the historical meanings probably provide a helpful model for understanding
what you want from them.

When companies own their own physical servers and have them housed somewhere, the questions
arise on where you want to locate any particular server.  It's a balancing act on things like
network speed of related servers being able to talk to each other, versus fault-tolerance
of having many servers not all exposed to the same risks.  

"Same rack" in that physical world tended to mean something like "all behind the same network
switch and all sharing the same power bus".  The morning after an electrical glitch fries
a power bus and thus everything in that rack, you realize you wished you didn't have so many
of the same type of server together.  Well, they were servers.  Now they are door stops. 
Badness and sadness.  

That's kind of the mindset to have in mind with racks in Cassandra.  It's an artifact for
you to separate servers into pools so that the disparate pools have hopefully somewhat independent
infrastructure risks.  However, all those servers are still doing the same kind of work, are
the same version, etc.

Datacenters are amalgams of those racks, and how similar or different they are from each other
depends on what you want to do with them.  What is true is that if you have N datacenters,
each one of them must have enough disk storage to house all the data.  The actual physical
footprint of that data in each DC depends on the replication factors in play.

Note that you sorta can't have "one datacenter for writes" because the writes will replicate
across the data centers.  You could definitely choose to have only one that takes read queries,
but best to think of writing as being universal.  One scenario you can have is where the DC
not taking live traffic read queries is the one you use for maintenance or performance testing
or version upgrades.

One rack makes your life easier if you don't have a reason for multiple racks. It depends
on the environment you deploy into and your fault tolerance goals.  If you were in AWS and
wanting to spread risk across availability zones, then you would likely have as many racks
as AZs you choose to be in, because that's really the point of using multiple AZs.


´╗┐On 10/23/19, 4:06 AM, "Sergio Bilello" <> wrote:

     Message from External Sender
    Hello guys!
    I was reading about

    I would like to understand a concept related to the node load balancing.
    I know that Jon recommends Vnodes = 4 but right now I found a cluster with vnodes = 256
replication factor = 3 and 2 racks. This is unbalanced because the racks are not a multiplier
of the replication factor.
    However, my plan is to move all the nodes in a single rack to eventually scale up and
down the node in the cluster once at the time. 
    If I had 3 racks and I would like to keep the things balanced I should scale up 3 nodes
at the time one for each rack.
    If I would have 3 racks, should I have also 3 different datacenters so one datacenter
for each rack? 
    Can I have 2 datacenters and 3 racks? If this is possible one datacenter would have more
nodes than the others? Could it be a problem?
    I am thinking to split my cluster in one datacenter for reads and one for writes and keep
all the nodes in the same rack so I can scale up once node at the time.
    Please correct me if I am wrong
    To unsubscribe, e-mail:
    For additional commands, e-mail:

View raw message