cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul Singh <rahul.xavier.si...@gmail.com>
Subject Re: Right sizing Cassandra data nodes
Date Tue, 20 Feb 2018 11:28:39 GMT
Node density is active data managed in the cluster divided by the number of active nodes. Eg.
If you you have 500TB or active data under management then you would need 250-500 nodes to
get beast like optimum performance. It also depends on how much memory is on the boxes and
if you are using SSD drives. SSD doesn’t replace memory but it doesn’t hurt.

--
Rahul Singh
rahul.singh@anant.us

Anant Corporation

On Feb 19, 2018, 5:55 PM -0500, Charulata Sharma (charshar) <charshar@cisco.com>, wrote:
> Thanks for the response Rahul. I did not understand the “node density” point.
>
> Charu
>
> From: Rahul Singh <rahul.xavier.singh@gmail.com>
> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> Date: Monday, February 19, 2018 at 12:32 PM
> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> Subject: Re: Right sizing Cassandra data nodes
>
> 1. I would keep opscenter on different cluster. Why unnecessarily put traffic and computing
for opscenter data on a real business data cluster?
> 2. Don’t put more than 1-2 TB per node. Maybe 3TB. Node density as it increases creates
more replication, read repairs , etc and memory usage for doing the compactions etc.
> 3. Can have as much as you want for snapshots as long as you have it on another disk
or even move it to a SAN / NAS. All you may care about us the most recent snapshot on the
physical machine / disks on a live node.
>
> --
> Rahul Singh
> rahul.singh@anant.us
>
> Anant Corporation
>
> On Feb 19, 2018, 3:08 PM -0500, Charulata Sharma (charshar) <charshar@cisco.com>,
wrote:
>
> > Hi All,
> >
> > Looking for some insight into how application data archive and purge is carried
out for C* database. Are there standard guidelines on calculating the amount of space that
can be used for storing data in a specific node.
> >
> > Some pointers that I got while researching are;
> >
> > -          Allocate 50% space for compaction, e.g. if data size is 50GB
then allocate 25GB for compaction.
> > -          Snapshot strategy. If old snapshots are present, then they occupy
the disk space.
> > -          Allocate some percentage of storage ( ???? ) for system tables
and OpsCenter tables ?
> >
> > We have a scenario where certain transaction data needs to be archived based on
business rules and some purged, so before deciding on an A&P strategy, I am trying to
analyze
> > how much transactional data can be stored given the current node capacity. I also
found out that the space available metric shown in Opscenter is not very reliable because
it doesn’t show
> > the snapshot space. In our case, we have a huge snapshot size. For some unexplained
reason, we seem to be taking snapshots of our data every hour and purging them only after
7 days.
> >
> >
> > Thanks,
> > Charu
> > Cisco Systems.
> >
> >
> >

Mime
View raw message