Return-Path: Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: (qmail 28705 invoked from network); 18 Apr 2011 17:18:44 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 18 Apr 2011 17:18:44 -0000 Received: (qmail 54027 invoked by uid 500); 18 Apr 2011 17:18:44 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 54000 invoked by uid 500); 18 Apr 2011 17:18:44 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 53992 invoked by uid 99); 18 Apr 2011 17:18:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Apr 2011 17:18:44 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Apr 2011 17:18:43 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 7D32DA75CA for ; Mon, 18 Apr 2011 17:18:06 +0000 (UTC) Date: Mon, 18 Apr 2011 17:18:06 +0000 (UTC) From: "Peter Schuller (JIRA)" To: commits@cassandra.apache.org Message-ID: <1500178936.64875.1303147086509.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <536685644.21759.1301506985705.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (CASSANDRA-2405) should expose 'time since last successful repair' for easier aes monitoring MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021122#comment-13021122 ] Peter Schuller commented on CASSANDRA-2405: ------------------------------------------- A further complication: Since the intent here is to enable people to set up alarms to trigger whenever the time-since-last is not within an acceptable range, it raises the issue of whether to keep this information persistent in system tables or just in-memory. Keeping in mind that: (1) For large amounts of data the act of doing another round of AES "just in case" if a node was restarted is significant (2) If the alarm were to triggered on the information not being available, that would instantly lead to false positive alarms when nodes are restarted, instantly rendering alarms useless to operations. (3) If the alarm were to ignore the case where the information is not yet available, that is a very dangerous silent failure and effectively means the alarm is not functioning properly. ... I get the feeling one wants this information persistent. I guess this all makes the ticket non-trivial, but I think the need for an "easy" way for operators to ensure sufficient AES frequency is important. (I'm actually kind of surprised issues with this do not crop up more often on the mailing lists... am I missing something that mitigates the impact here, or are people just using sufficiently long grace periods relative to repair frequency that they're not hitting these things in practice?) > should expose 'time since last successful repair' for easier aes monitoring > --------------------------------------------------------------------------- > > Key: CASSANDRA-2405 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2405 > Project: Cassandra > Issue Type: Improvement > Reporter: Peter Schuller > Assignee: Pavel Yaskevich > Priority: Minor > Fix For: 0.7.5 > > Attachments: CASSANDRA-2405.patch > > > The practical implementation issues of actually ensuring repair runs is somewhat of an undocumented/untreated issue. > One hopefully low hanging fruit would be to at least expose the time since last successful repair for a particular column family, to make it easier to write a correct script to monitor for lack of repair in a non-buggy fashion. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira