Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6AF7B17FC4 for ; Tue, 7 Oct 2014 21:13:35 +0000 (UTC) Received: (qmail 72497 invoked by uid 500); 7 Oct 2014 21:13:35 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 72456 invoked by uid 500); 7 Oct 2014 21:13:35 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 72444 invoked by uid 99); 7 Oct 2014 21:13:35 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Oct 2014 21:13:35 +0000 Date: Tue, 7 Oct 2014 21:13:35 +0000 (UTC) From: "Robert Coli (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-7317) Repair range validation and calculation is off MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162539#comment-14162539 ] Robert Coli commented on CASSANDRA-7317: ---------------------------------------- {quote} If -pr is meant to be used that way, I don't think it is communicated very well. {quote} As a meta-aside, the way that I explain to new users what -pr is for is by saying : "If you are repairing ALL nodes in your cluster, use -pr. If you are not repairing ALL nodes, do not use -pr." I agree that the docs could make this idea clearer, of course with the new -local wrinkle reflected. > Repair range validation and calculation is off > ---------------------------------------------- > > Key: CASSANDRA-7317 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7317 > Project: Cassandra > Issue Type: Bug > Reporter: Nick Bailey > Assignee: Yuki Morishita > Fix For: 2.0.9 > > Attachments: 7317-2.0.txt, Untitled Diagram(1).png > > > From what I can tell the calculation (using the -pr option) and validation of tokens for repairing ranges is broken. Or at least should be improved. Using an example with ccm: > Nodetool ring: > {noformat} > Datacenter: dc1 > ========== > Address Rack Status State Load Owns Token > -10 > 127.0.0.1 r1 Up Normal 188.96 KB 50.00% -9223372036854775808 > 127.0.0.2 r1 Up Normal 194.77 KB 50.00% -10 > Datacenter: dc2 > ========== > Address Rack Status State Load Owns Token > 0 > 127.0.0.4 r1 Up Normal 160.58 KB 0.00% -9223372036854775798 > 127.0.0.3 r1 Up Normal 139.46 KB 0.00% 0 > {noformat} > Schema: > {noformat} > CREATE KEYSPACE system_traces WITH replication = { > 'class': 'NetworkTopologyStrategy', > 'dc2': '2', > 'dc1': '2' > }; > {noformat} > Repair -pr: > {noformat} > [Nicks-MacBook-Pro:21:35:58 cassandra-2.0] cassandra$ bin/nodetool -p 7100 repair -pr system_traces > [2014-05-28 21:36:01,977] Starting repair command #12, repairing 1 ranges for keyspace system_traces > [2014-05-28 21:36:02,207] Repair session f984d290-e6d9-11e3-9edc-5f8011daec21 for range (0,-9223372036854775808] finished > [2014-05-28 21:36:02,207] Repair command #12 finished > [Nicks-MacBook-Pro:21:36:02 cassandra-2.0] cassandra$ bin/nodetool -p 7200 repair -pr system_traces > [2014-05-28 21:36:14,086] Starting repair command #1, repairing 1 ranges for keyspace system_traces > [2014-05-28 21:36:14,406] Repair session 00bd45b0-e6da-11e3-98fc-5f8011daec21 for range (-9223372036854775798,-10] finished > [2014-05-28 21:36:14,406] Repair command #1 finished > {noformat} > Note that repairing both nodes in dc1, leaves very small ranges unrepaired. For example (-10,0]. Repairing the 'primary range' in dc2 will repair those small ranges. Maybe that is the behavior we want but it seems counterintuitive. > The behavior when manually trying to repair the full range of 127.0.0.01 definitely needs improvement though. > Repair command: > {noformat} > [Nicks-MacBook-Pro:21:50:44 cassandra-2.0] cassandra$ bin/nodetool -p 7100 repair -st -10 -et -9223372036854775808 system_traces > [2014-05-28 21:50:55,803] Starting repair command #17, repairing 1 ranges for keyspace system_traces > [2014-05-28 21:50:55,804] Starting repair command #17, repairing 1 ranges for keyspace system_traces > [2014-05-28 21:50:55,804] Repair command #17 finished > [Nicks-MacBook-Pro:21:50:56 cassandra-2.0] cassandra$ echo $? > 1 > {noformat} > system.log: > {noformat} > ERROR [Thread-96] 2014-05-28 21:40:05,921 StorageService.java (line 2621) Repair session failed: > java.lang.IllegalArgumentException: Requested range intersects a local range but is not fully contained in one; this would lead to imprecise repair > {noformat} > * The actual output of the repair command doesn't really indicate that there was an issue. Although the command does return with a non zero exit status. > * The error here is invisible if you are using the synchronous jmx repair api. It will appear as though the repair completed successfully. > * Personally, I believe that should be a valid repair command. For the system_traces keyspace, 127.0.0.1 is responsible for this range (and I would argue the 'primary range' of the node). -- This message was sent by Atlassian JIRA (v6.3.4#6332)