Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9770A200B91 for ; Thu, 29 Sep 2016 19:20:12 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 961D2160AE3; Thu, 29 Sep 2016 17:20:12 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id AAB4C160AC1 for ; Thu, 29 Sep 2016 19:20:11 +0200 (CEST) Received: (qmail 40228 invoked by uid 500); 29 Sep 2016 17:20:09 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 40218 invoked by uid 99); 29 Sep 2016 17:20:09 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Sep 2016 17:20:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id CA04EC3C06 for ; Thu, 29 Sep 2016 17:20:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.78 X-Spam-Level: * X-Spam-Status: No, score=1.78 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, WEIRD_PORT=0.001] autolearn=disabled Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id xDqM2j4WTDfJ for ; Thu, 29 Sep 2016 17:20:06 +0000 (UTC) Received: from mx0a-000e6001.pphosted.com (mx0a-000e6001.pphosted.com [67.231.144.81]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id 740F160E00 for ; Thu, 29 Sep 2016 17:20:05 +0000 (UTC) Received: from pps.filterd (m0082679.ppops.net [127.0.0.1]) by m0082679.ppops.net (8.16.0.17/8.16.0.17) with SMTP id u8THFOso009330 for ; Thu, 29 Sep 2016 13:19:57 -0400 Authentication-Results: ppops.net; spf=pass smtp.mailfrom=guangxing.li@pearson.com Received: from mail-it0-f69.google.com (mail-it0-f69.google.com [209.85.214.69]) by m0082679.ppops.net with ESMTP id 25rv11ukce-51 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 29 Sep 2016 13:19:57 -0400 Received: by mail-it0-f69.google.com with SMTP id o21so63855213itb.3 for ; Thu, 29 Sep 2016 10:19:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=W8noT7tePLkFgG8b1QN7B+esT2Onae/pHXIObqP6sls=; b=V6lwkDYaGie3EMDiAEry9rwHsxg/ajdyj/M3OgDibRrQ6BneGM2aqehsGfRcF+a9sW ApRiP04OgC56GE5y7LchD5SqIunMzQr2lIKivdsPbITNNNNjBydwDPAwbGoriKf3HtWB YU/M1CNETZ4bRTPsmwuhx/869eDTxtIqJabv45xQSOQxKybagLF2wmTOOHEyvtQHBGLW u5l+vUuIpN9QafsYlQowhJ4SQQ0wOdv8+YsYUt1lq3ZGVYJn1q7l+8ZnAbkbutuMmwUQ sA3UV8Rx0gXe8c/oHVRJ2nI0d7fWW4wIre07Cmf6ryr/doYH3cIj/kBd3VSU7cKkQQbA fwJg== X-Gm-Message-State: AA6/9RmJO05k1pgIV94mgboFcJDmoIUFFweNkCCsfL5I4Mu0gy+alK1Aiv62B4Ygy9tFzBTeQfgqTU/gLozoWRFlNnJBgvC+q+SR09nMtVTnpVh0kHO8vcmmwDzDcPFdnPB86f4LKeu3TplFA/CoRzcdLNQlchbvnFQ6fkCiIVeqYMcXXZAtU9ml X-Received: by 10.36.31.18 with SMTP id d18mr4222504itd.84.1475169571414; Thu, 29 Sep 2016 10:19:31 -0700 (PDT) X-Received: by 10.36.31.18 with SMTP id d18mr4222495itd.84.1475169571228; Thu, 29 Sep 2016 10:19:31 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.16.86 with HTTP; Thu, 29 Sep 2016 10:19:30 -0700 (PDT) In-Reply-To: <1893276950.6705564.1474633255633@mail.yahoo.com> References: <1061400395.4655637.1474473114779@mail.yahoo.com> <2132965820.5615001.1474546996017@mail.yahoo.com> <1513663612.5903714.1474558986196@mail.yahoo.com> <953704496.5874903.1474559274477@mail.yahoo.com> <1893276950.6705564.1474633255633@mail.yahoo.com> From: "Li, Guangxing" Date: Thu, 29 Sep 2016 11:19:30 -0600 Message-ID: Subject: Re: Nodetool repair To: user@cassandra.apache.org, Romain Hardouin Content-Type: multipart/alternative; boundary=001a1145f7ce46e24f053da8ae36 X-Proofpoint-SPF-Result: pass X-Proofpoint-SPF-Record: v=spf1 include:spf.pearson.com ip4:204.14.232.0/23 ip4:204.14.237.0/24 -all X-Proofpoint-SPF-Policy: Default X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-09-29_11:,, signatures=0 Internal Virus Policy archived-at: Thu, 29 Sep 2016 17:20:12 -0000 --001a1145f7ce46e24f053da8ae36 Content-Type: text/plain; charset=UTF-8 Romain, I was trying what you mentioned as below: a. nodetool stop VALIDATION b. echo run -b org.apache.cassandra.db:type=StorageService forceTerminateAllRepairSessions | java -jar /tmp/jmxterm/jmxterm-1.0-alpha-4-uber.jar -l 127.0.0.1:7199 to stop a seemingly forever-going repair but seeing really odd behavior with C* 2.0.9. Here is what I did: 1. First, I run 'nodetool tpstats' on all nodes in the cluster and seeing only one node have 1 active pending AntiEntropySessions. All other nodes do not have any pending or active AntiEntropySessions. 2. Then I grep 'Repair' on all logs on all nodes and seeing absolutely no repair related activity in these logs for the past day. 3. Then on the node that has active AntiEntropySessions, I did steps 'a' and 'b' above. Now all the sudden I start seeing repair activities, on nodes that did not have pending AntiEntropySessions, I am seeing the following in their logs: INFO [NonPeriodicTasks:1] 2016-09-29 17:12:53,469 StreamingRepairTask.java (line 87) [repair #e80e17d0-8667-11e6-a801-e172d7a67134] streaming task succeed, returning response to /10.253.2.166 On node 10.253.2.166 which has active pending AntiEntropySessions, I am seeing the following in the log: INFO [AntiEntropySessions:136] 2016-09-29 17:03:02,405 RepairSession.java (line 282) [repair #812dafe0-8666-11e6-a801-e172d7a67134] session completed successfully So it seems to me that by doing forceTerminateAllRepairSessions, it actually 'wakes up' the dormant repair so it goes again. So far, the only way I can get working to stop a repair is to restart C* node where the repair command is initiated. Thanks. George. On Fri, Sep 23, 2016 at 6:20 AM, Romain Hardouin wrote: > OK. If you still have issues after setting streaming_socket_timeout_in_ms > != 0, consider increasing request_timeout_in_ms to a high value, say 1 or 2 > minutes. See comments in https://issues.apache.org/ > jira/browse/CASSANDRA-7904 > Regarding 2.1, be sure to test incremental repair on your data before to > run it in production ;-) > > Romain > --001a1145f7ce46e24f053da8ae36 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Romain,=

I was trying what you mentioned as = below:

a.=C2=A0nodetool stop VALIDATI= ON
b. echo run -b org.apache.cas= sandra.db:type=3DStorageService forceTerminateAllRepairSessions |= java -jar /tmp/jmxterm/jmxterm-1.0-alpha-4-uber.jar -l=C2=A0127.0.0.1:7199

to stop a seemingly forever-going repair but see= ing really odd behavior with C* 2.0.9. Here is what I did:
1. = First, I run 'nodetool tpstats' on all nodes in the cluster and see= ing only one node have 1 active pending=C2=A0AntiEntropySessions. All other nodes= do not have any pending or active=C2=A0AntiEntropySessions.
2. Then I = grep 'Repair' on all logs on all nodes and seeing absolutely no rep= air related activity in these logs for the past day.
3. Then on t= he node that has active=C2=A0AntiEntropySessions, I did steps 'a' a= nd 'b' above. Now all the sudden I start seeing repair activities, = on nodes that did not have pending=C2=A0AntiEntropySessions, I am seeing the following in their logs:
INFO [NonPeriodicTasks:1] 2016-09-29 17:12:53,469 StreamingRepairT= ask.java (line 87) [repair #e80e17d0-8667-11e6-a801-e172d7a67134] streaming= task succeed, returning response to /10.25= 3.2.166
On node=C2=A010.253.2.166 which has active pending=C2=A0AntiEntropySessions, I am seeing the following in the lo= g:
INFO [AntiEntropySessions:136] 2016-09-29 17:03:02,405 = RepairSession.java (line 282) [repair #812dafe0-8666-11e6-a801-e172d7a67134= ] session completed successfully

So it seems = to me that by doing=C2=A0forceTermi= nateAllRepairSessions, it actually 'wakes up' the dormant repair so it= goes again. So far, the only way I can get working to stop a repair is to = restart C* node where the repair command is initiated.
<= div>
Thanks.

George.

On Fri, Sep 23, 2016 at= 6:20 AM, Romain Hardouin <romainh_ml@yahoo.fr> wrote:
=
OK. If you= still have issues after setting streaming_socket_timeout_in_ms !=3D 0, con= sider increasing request_timeout_in_ms to a high value, say 1 or 2 minutes.= See comments in https://issues.apache.org/jira/browse/CASSAND= RA-7904
Regarding 2.1, be= sure to test incremental repair on your data before to run it in productio= n ;-)

Romain

--001a1145f7ce46e24f053da8ae36--