Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 75D4DFFA9 for ; Tue, 16 Apr 2013 22:13:17 +0000 (UTC) Received: (qmail 66548 invoked by uid 500); 16 Apr 2013 22:13:17 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 66518 invoked by uid 500); 16 Apr 2013 22:13:17 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 66510 invoked by uid 99); 16 Apr 2013 22:13:17 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Apr 2013 22:13:17 +0000 Date: Tue, 16 Apr 2013 22:13:17 +0000 (UTC) From: "Arya Goudarzi (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arya Goudarzi updated CASSANDRA-5432: ------------------------------------- Description: Read comment 6. This description summarizes the repair issue only, but I believe there is a bigger problem going on with networking as described on that comment. Since I have upgraded our sandbox cluster, I am unable to run repair on any node and I am reaching our gc_grace seconds this weekend. Please help. So far, I have tried the following suggestions: - nodetool scrub - offline scrub - running repair on each CF separately. Didn't matter. All got stuck the same way. The repair command just gets stuck and the machine is idling. Only the following logs are printed for repair job: INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) Starting repair command #4, repairing 1 ranges for keyspace cardspring_production INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range (1808575600,42535295865117307932921825930779602032] for keyspace_production.[comma separated list of CFs] INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, /X.X.X.190]) INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.43 INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.56 Please advise. was: Since I have upgraded our sandbox cluster, I am unable to run repair on any node and I am reaching our gc_grace seconds this weekend. Please help. So far, I have tried the following suggestions: - nodetool scrub - offline scrub - running repair on each CF separately. Didn't matter. All got stuck the same way. The repair command just gets stuck and the machine is idling. Only the following logs are printed for repair job: INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) Starting repair command #4, repairing 1 ranges for keyspace cardspring_production INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range (1808575600,42535295865117307932921825930779602032] for keyspace_production.[comma separated list of CFs] INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, /X.X.X.190]) INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.43 INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.56 Please advise. > Repair Freeze/Gossip Invisibility Issues 1.2.4 > ---------------------------------------------- > > Key: CASSANDRA-5432 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5432 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 1.2.4 > Environment: Ubuntu 10.04.1 LTS > C* 1.2.3 > Sun Java 6 u43 > JNA Enabled > Not using VNodes > Reporter: Arya Goudarzi > Priority: Critical > > Read comment 6. This description summarizes the repair issue only, but I believe there is a bigger problem going on with networking as described on that comment. > Since I have upgraded our sandbox cluster, I am unable to run repair on any node and I am reaching our gc_grace seconds this weekend. Please help. So far, I have tried the following suggestions: > - nodetool scrub > - offline scrub > - running repair on each CF separately. Didn't matter. All got stuck the same way. > The repair command just gets stuck and the machine is idling. Only the following logs are printed for repair job: > INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) Starting repair command #4, repairing 1 ranges for keyspace cardspring_production > INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range (1808575600,42535295865117307932921825930779602032] for keyspace_production.[comma separated list of CFs] > INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, /X.X.X.190]) > INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.43 > INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.56 > Please advise. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira