From "Jeff Jirsa"
Subject Re: Information on Cassandra
Date Fri, 02 Jun 2017 18:01:51 GMT

On 2017-06-01 09:09 (-0700), "Harper, Paul" <> wrote: 
> Hello All,
> I'm about 3 months into support several clusters of Cassandra databases. I recently subscribed
to this email list and I receive lots of interesting emails most of which I don't understand.
I feel like I have a pretty good grasp on Cassandra, I would like to know what types of this
should I be checking on a daily, weekly or monthly basis. Many of the email I see in this
string are on subjects I've never had to look at so far. So I'm wondering what is it that
I should be monitoring or doing or I should know. I would appreciate it any advice or guidance
you can provide. Please to my email and not the group listing  unless it's something that
maybe helpful to others.

The good news is that cassandra can run for years without any intervention, especially if
you're not pushing the limits.

At a high level, you should be watching:
- Read/writes per second. Your application may warn you if these change, but catching it before
it impacts your application is always nice. 
- Latencies (how long does each read/write take, and is that getting worse over time, which
may indicate a problem brewing)
- How much data is on each node (hopefully it's pretty even)
- How many sstables are on each node (hopefully it's pretty even)
- GC pause times (you're probably using parnew/cms, most metrics packages will know how to
graph those as two distinct lines - seeing long pauses is a good hint that things are starting
to get bad)
- How often are you running repair? Is repair succeeding? Is it failing? If you delete data,
you need to repair (successfully, all nodes) at least once every gc_grace_seconds (by default
10 days). 
- Whether or not schema versions match - if schema diverges, you could have a big problem

