cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10241) Keep a separate production debug log for troubleshooting
Date Mon, 28 Sep 2015 23:14:07 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14934310#comment-14934310
] 

Paulo Motta commented on CASSANDRA-10241:
-----------------------------------------

Now that we have the basic capability committed, I'd like to follow up on this by introducing
a simple logging guideline for future system logging statements, based on the discussions
of this thread and current practices. This guideline could help external and new contributors
to understand the logging practices, and current contributors to review tickets related to
logging using the new framework.

I've drafted an initial version for review, presented below:

*INFO*: General cluster status, operations overview. At this level a beginner user or operator
should be able to understand most messages. 
Examples:
* Node startup and shutdown information
* User or system triggered operations overview
** Repair start and finish state
** Cleanup start and finish state
** Bootstrap start and finish state
** Index rebuild start and finish state

*DEBUG*: Low frequency state changes or message passing. Non-critical path logs on operation
details, performance measurements or general troubleshooting information. At this level an
advanced operator or system developer will have elements to investigate or detect erroneous
conditions or performance bottlenecks, extract reproduction steps or inspect advanced operational
information.
Examples:
* SSTable flushing
* Compactions in progress
* Gossip or schema state changes
* Operations intermediate steps
** Repair steps
** Stream session message exchanges

*WARN*: Use of suboptimal parameters or deprecated options, detection of degraded performance,
capability limitations or missing dependencies. General optimization tips. At this level,
an operator should be able to detect an eminent error condition, use of suboptimal parameters
or non-critical configuration errors. Examples:
* Use of chunk_length_in_kb property instead of chunk_length
* GC above treshold warnings
* OpenJDK not recommended notice
* Small sstable size warning (Testing done for CASSANDRA-5727 indicates that performance improves
up to 160MB)

*ERROR*:  A expected error condition has ocurred. Non-critical, transient or recovered errors
might be reported at DEBUG level instead so they don't pollute system.log.
Examples:
 * critical errors in general (corrupted disk, read error, etc)
 * leak detection

*TRACE*:  High frequency state changes or message passing, critical path logs, testing or
development information. This level is disabled by default, so everything that does not fit
in the previous levels and highly verbose stuff must be kept at TRACE level. 
Examples:
* Failure detector checks
* Gossip digests
* CassandraServer.insert()

What do you think [~aweisberg]? After review and suggestions, if there are no objections,
I will add this to the wiki and send an e-mail to the dev list.

After this, the next step would be to groom the current logs in a separate ticket so they
follow the guideline.

> Keep a separate production debug log for troubleshooting
> --------------------------------------------------------
>
>                 Key: CASSANDRA-10241
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10241
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Config
>            Reporter: Jonathan Ellis
>            Assignee: Paulo Motta
>             Fix For: 2.2.x, 3.0.0 rc2
>
>         Attachments: 2.2-debug.log, 2.2-system.log, 3.0-debug.log, 3.0-system.log
>
>
> [~aweisberg] had the suggestion to keep a separate debug log for aid in troubleshooting,
not intended for regular human consumption but where we can log things that might help if
something goes wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message