cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Schuller (Commented) (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-3670) provide "red flags" JMX instrumentation
Date Sat, 24 Dec 2011 04:46:30 GMT


Peter Schuller commented on CASSANDRA-3670:

Also, the whole JMX bit is actually a pretty annoying little detail for many situations. There
seems to exist no implementation outside of the JVM, and writing a trivial monitor along the
lines of:

  warnings=$(curl http://localhost:XXX/bla/bla/redflags | egrep -v ': 0$' | wc -l)

Becomes a chore. From what I can tell everyone keeps using that magic .jar that no one knows
where it comes from that e.g. cassandra-munin-plugins uses. It's a real hassle to be constantly
launching a JVM just for metrics extraction.

Now granted, if you are fully "JMX enabled" in your infrastructure there is no issue, but
I really think something like this goes a long way towards making Cassandra more operator-friendly
- particularly to individuals and/or small organizations that want to monitor in some simple
way and do not want to spend time on JMX issues.

> provide "red flags" JMX instrumentation
> ---------------------------------------
>                 Key: CASSANDRA-3670
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>            Priority: Minor
> As discussed in CASSANDRA-3641, it would be nice to expose through JMX certain information
which is almost without exception indicative of something being wrong with the node or cluster.
> In the CASSANDRA-3641 case, it was the detection of corrupt counter shards. Other examples
> * Number of times the selection of files to compact was adjusted due to disk space heuristics
> * Number of times compaction has failed
> * Any I/O error reading from or writing to disk (the work here is collecting, not exposing,
so maybe not in an initial version)
> * Any data skipped due to checksum mismatches (when checksumming is being used); e.g.,
"number of skips".
> * Any arbitrary exception at least in certain code paths (compaction, scrub, cleanup
for starters)
> Probably other things.
> The motivation is that if we have clear and obvious indications that something truly
is wrong, it seems suboptimal to just leave that information in the log somewhere, for someone
to discover later when something else broke as a result and a human investigates. You might
argue that one should use non-trivial log analysis to detect these things, but I highly doubt
a lot of people do this and it seems very wasteful to require that in comparison to just providing
the MBean.
> It is important to note that the *lack* of a certain problem being advertised in this
MBean is not supposed to be indicative of a *lack* of a problem. Rather, the point is that
to the extent we can easily do so, it is nice to have a clear method of communicating to monitoring
systems where there *is* a clear indication of something being wrong.
> The main part of this ticket is not to cover everything under the sun, but rather to
reach agreement on adding an MBean where these types of indicators can be collected. Individual
counters can then be added over time as one thinks of them.
> I propose:
> * Create an org.apache.cassandra.db.RedFlags MBean
> * Populate with a few things to begin with.
> I'll submit the patch if there is agreement.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message