Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 24EB0200B23 for ; Sun, 5 Jun 2016 00:02:01 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 181C6160A26; Sat, 4 Jun 2016 22:02:01 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6093C160A16 for ; Sun, 5 Jun 2016 00:02:00 +0200 (CEST) Received: (qmail 54156 invoked by uid 500); 4 Jun 2016 22:01:59 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 54139 invoked by uid 99); 4 Jun 2016 22:01:59 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Jun 2016 22:01:59 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 447262C1F6D for ; Sat, 4 Jun 2016 22:01:59 +0000 (UTC) Date: Sat, 4 Jun 2016 22:01:59 +0000 (UTC) From: "Mahdi Mohammadi (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (CASSANDRA-11933) Improve Repair performance MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sat, 04 Jun 2016 22:02:01 -0000 [ https://issues.apache.org/jira/browse/CASSANDRA-11933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15314971#comment-15314971 ] Mahdi Mohammadi edited comment on CASSANDRA-11933 at 6/4/16 10:01 PM: ---------------------------------------------------------------------- ||2.1||2.2||3.0||3.6|| |[branch|https://github.com/apache/cassandra/compare/cassandra-2.1...mm-binary:11933-2.1?expand=1]|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...mm-binary:11933-2.2?expand=0]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...mm-binary:11933-3.0?expand=1]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.6...mm-binary:11933-3.6?diff=unified&expand=1&name=11933-3.6]| |testall|testall|testall|testall| |dtest|dtest|dtest|dtest| Will continue to add remaining branches if can't be auto-merged. was (Author: mahdix): ||2.1||2.2||3.0||3.6|| |[branch|https://github.com/apache/cassandra/compare/cassandra-2.1...mm-binary:11933-2.1?expand=1]|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...mm-binary:11933-2.2?expand=0]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...mm-binary:11933-3.0?expand=1]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.6...mm-binary:11933-3.6?diff=unified&expand=1&name=11933-3.6]| |testall|testall|testall| |dtest|dtest|dtest| Will continue to add remaining branches if can't be auto-merged. > Improve Repair performance > -------------------------- > > Key: CASSANDRA-11933 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11933 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Cyril Scetbon > Assignee: Mahdi Mohammadi > > During a full repair on a ~ 60 nodes cluster, I've been able to see that this stage can be significant (up to 60 percent of the whole time) : > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/StorageService.java#L2983-L2997 > It's merely caused by the fact that https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/ActiveRepairService.java#L189 calls {code}ss.getLocalRanges(keyspaceName){code} everytime and that it takes more than 99% of the time. This call takes 600ms when there is no load on the cluster and more if there is. So for 10k ranges, you can imagine that it takes at least 1.5 hours just to compute ranges. > Underneath it calls [ReplicationStrategy.getAddressRanges|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L170] which can get pretty inefficient ([~jbellis]'s [words|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L165]) > *ss.getLocalRanges(keyspaceName)* should be cached to avoid having to spend hours on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)