cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Li, Guangxing" <guangxing...@pearson.com>
Subject Re: Nodetool repair
Date Thu, 29 Sep 2016 17:19:30 GMT
Romain,

I was trying what you mentioned as below:

a. nodetool stop VALIDATION
b. echo run -b org.apache.cassandra.db:type=StorageService
forceTerminateAllRepairSessions | java -jar
/tmp/jmxterm/jmxterm-1.0-alpha-4-uber.jar
-l 127.0.0.1:7199

to stop a seemingly forever-going repair but seeing really odd behavior
with C* 2.0.9. Here is what I did:
1. First, I run 'nodetool tpstats' on all nodes in the cluster and seeing
only one node have 1 active pending AntiEntropySessions. All other nodes do
not have any pending or active AntiEntropySessions.
2. Then I grep 'Repair' on all logs on all nodes and seeing absolutely no
repair related activity in these logs for the past day.
3. Then on the node that has active AntiEntropySessions, I did steps 'a'
and 'b' above. Now all the sudden I start seeing repair activities, on
nodes that did not have pending AntiEntropySessions, I am seeing the
following in their logs:
INFO [NonPeriodicTasks:1] 2016-09-29 17:12:53,469 StreamingRepairTask.java
(line 87) [repair #e80e17d0-8667-11e6-a801-e172d7a67134] streaming task
succeed, returning response to /10.253.2.166
On node 10.253.2.166 which has active pending AntiEntropySessions, I am
seeing the following in the log:
INFO [AntiEntropySessions:136] 2016-09-29 17:03:02,405 RepairSession.java
(line 282) [repair #812dafe0-8666-11e6-a801-e172d7a67134] session completed
successfully

So it seems to me that by doing forceTerminateAllRepairSessions, it
actually 'wakes up' the dormant repair so it goes again. So far, the only
way I can get working to stop a repair is to restart C* node where the
repair command is initiated.

Thanks.

George.

On Fri, Sep 23, 2016 at 6:20 AM, Romain Hardouin <romainh_ml@yahoo.fr>
wrote:

> OK. If you still have issues after setting streaming_socket_timeout_in_ms
> != 0, consider increasing request_timeout_in_ms to a high value, say 1 or 2
> minutes. See comments in https://issues.apache.org/
> jira/browse/CASSANDRA-7904
> Regarding 2.1, be sure to test incremental repair on your data before to
> run it in production ;-)
>
> Romain
>

Mime
View raw message