incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koppel, Jeremy" <>
Subject Sizing a new Cassandra cluster
Date Thu, 05 Jun 2014 21:58:51 GMT
I have been able to find lots of general information about sizing each node in a new Cassandra
cluster, but have not come across any specific recommendations about the total size and configuration
of the cluster (the number of nodes required per data center, the number of data centers,
throughput requirements between data centers, etc.).   I am currently in the process of sizing
a new Cassandra cluster to support the following:

  *   Probably more write intensive than read, or at least 65% / 35%.
  *   Writes per day:  200,000,000 (~2315 per second).
  *   Data retention = 30 days.
  *   Replication Factor = 3.  (I anticipate reads and writes of CL = Quorum or Quorum Local.)
  *   My developers estimate a payload of ~300 bytes per record.
     *   Throughput per second (MiB):  (Records per second * Replication Facor * Event Payload)
/ 1024 / 1024 = 1.99 MiB/Sec.
     *   Storage required (TiB):  (Events per day * Event Payload * Replication Factor * Data
Retention * 2) / 1024 / 1024 / 1024 / 1024 = 9.82 TiB.
        *   Size doubled to provide room for Compaction.

I’m wondering if I’m on the right track with my math, and if the following configuration
would perform well, and leave a little overhead:

  *   2 Data Centers (they could co-exist with the application clusters).
  *   12 nodes (6 per data center) with:
     *   1 TiB storage capacity each.
        *   I’ve seen varying information for RAID usage / configuration.  Is a RAID 1 mirrored
over 2x 1 TiB SSD drives performant?  That might be a good configuration for us, and provide
some high availability so that we can lose a drive without having to repair a node.  Or is
it better to buy an additional node for extra capacity, save the data to single SSDs and let
it fail?  (Or stripe 2x 500 GiB SSD drives…)
        *   Do we need to store the CommitLog on a separate drive if we’re using SSD?  How
much space do we leave for it?  Do we really need separate controllers?
     *   8 CPU cores.
     *   32GB RAM.

Thoughts?  Is this enough?  Overkill?


View raw message