cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4
Date Thu, 19 May 2016 21:12:12 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15292145#comment-15292145
] 

Paulo Motta commented on CASSANDRA-11845:
-----------------------------------------

Unfortunately it's not possible to track down the cause from these logs your posted. You'll
need to [enable DEBUG logging|https://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configLoggingLevels_r.html]
on the {{org.apache.cassandra.streaming}} and {{org.apache.cassandra.repair}} packages and
attach full debug.log on this ticket (you should use the attach files functionality of JIRA
instead of pasting logs on the comments).

Please note that to cancel hanged repair you'll probably need to restart involved nodes first
before starting a new repair (stop repair functionality will be provided by CASSANDRA-3486).

> Hanging repair in cassandra 2.2.4
> ---------------------------------
>
>                 Key: CASSANDRA-11845
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11845
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>         Environment: Centos 6
>            Reporter: vin01
>            Priority: Minor
>
> So after increasing the streaming_timeout_in_ms value to 3 hours, i was able to avoid
the socketTimeout errors i was getting earlier (https://issues.apAache.org/jira/browse/CASSANDRA-11826),
but now the issue is repair just stays stuck.
> current status :-
> [2016-05-19 05:52:50,835] Repair session a0e590e1-1d99-11e6-9d63-b717b380ffdd for range
(-3309358208555432808,-3279958773585646585] finished (progress: 54%)
> [2016-05-19 05:53:09,446] Repair session a0e590e3-1d99-11e6-9d63-b717b380ffdd for range
(8149151263857514385,8181801084802729407] finished (progress: 55%)
> [2016-05-19 05:53:13,808] Repair session a0e5b7f1-1d99-11e6-9d63-b717b380ffdd for range
(3372779397996730299,3381236471688156773] finished (progress: 55%)
> [2016-05-19 05:53:27,543] Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd for range
(-4182952858113330342,-4157904914928848809] finished (progress: 55%)
> [2016-05-19 05:53:41,128] Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range
(6499366179019889198,6523760493740195344] finished (progress: 55%)
> And its 10:46:25 Now, almost 5 hours since it has been stuck right there.
> Earlier i could see repair session going on in system.log but there are no logs coming
in right now, all i get in logs is regular index summary redistribution logs.
> Last logs for repair i saw in logs :-
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,125 RepairJob.java:152 - [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd]
TABLE_NAME is fully synced
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairSession.java:279 - [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd]
Session completed successfully
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairRunnable.java:232 - Repair session
a0e5df00-1d99-11e6-9d63-b717b380ffdd for range (6499366179019889198,6523760493740195344] finished
> Its an incremental repair, and in "nodetool netstats" output i can see logs like :-
> Repair e3055fb0-1d9d-11e6-9d63-b717b380ffdd
>     /Node-2
>         Receiving 8 files, 1093461 bytes total. Already received 8 files, 1093461 bytes
total
>             /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80872-big-Data.db
399475/399475 bytes(100%) received from idx:0/Node-2
>             /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80879-big-Data.db
53809/53809 bytes(100%) received from idx:0/Node-2
>             /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80878-big-Data.db
89955/89955 bytes(100%) received from idx:0/Node-2
>             /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80881-big-Data.db
168790/168790 bytes(100%) received from idx:0/Node-2
>             /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80886-big-Data.db
107785/107785 bytes(100%) received from idx:0/Node-2
>             /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80880-big-Data.db
52889/52889 bytes(100%) received from idx:0/Node-2
>             /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80884-big-Data.db
148882/148882 bytes(100%) received from idx:0/Node-2
>             /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80883-big-Data.db
71876/71876 bytes(100%) received from idx:0/Node-2
>         Sending 5 files, 863321 bytes total. Already sent 5 files, 863321 bytes total
>             /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db
161895/161895 bytes(100%) sent to idx:0/Node-2
>             /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db
399865/399865 bytes(100%) sent to idx:0/Node-2
>             /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db
149066/149066 bytes(100%) sent to idx:0/Node-2
>             /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72682-big-Data.db
126000/126000 bytes(100%) sent to idx:0/Node-2
>             /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73173-big-Data.db
26495/26495 bytes(100%) sent to idx:0/Node-2
> Repair c0c8af20-1d9c-11e6-9d63-b717b380ffdd
>     /Node-3
>         Receiving 11 files, 13896288 bytes total. Already received 11 files, 13896288
bytes total
>             /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79186-big-Data.db
1598874/1598874 bytes(100%) received from idx:0/Node-3
>             /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79196-big-Data.db
736365/736365 bytes(100%) received from idx:0/Node-3
>             /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79197-big-Data.db
326558/326558 bytes(100%) received from idx:0/Node-3
>             /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79187-big-Data.db
1484827/1484827 bytes(100%) received from idx:0/Node-3
>             /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79180-big-Data.db
393636/393636 bytes(100%) received from idx:0/Node-3
>             /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79184-big-Data.db
825459/825459 bytes(100%) received from idx:0/Node-3
>             /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79188-big-Data.db
3568782/3568782 bytes(100%) received from idx:0/Node-3
>             /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79182-big-Data.db
271222/271222 bytes(100%) received from idx:0/Node-3
>             /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79193-big-Data.db
4315497/4315497 bytes(100%) received from idx:0/Node-3
>             /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79183-big-Data.db
19775/19775 bytes(100%) received from idx:0/Node-3
>             /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79192-big-Data.db
355293/355293 bytes(100%) received from idx:0/Node-3
>         Sending 5 files, 9444101 bytes total. Already sent 5 files, 9444101 bytes total
>             /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db
1796825/1796825 bytes(100%) sent to idx:0/Node-3
>             /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db
4549996/4549996 bytes(100%) sent to idx:0/Node-3
>             /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db
1658881/1658881 bytes(100%) sent to idx:0/Node-3
>             /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72682-big-Data.db
1418335/1418335 bytes(100%) sent to idx:0/Node-3
>             /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73173-big-Data.db
20064/20064 bytes(100%) sent to idx:0/Node-3
> Read Repair Statistics:
> Attempted: 1142
> Mismatch (Blocking): 0
> Mismatch (Background): 0
> Pool Name                    Active   Pending      Completed
> Large messages                  n/a         0            779
> Small messages                  n/a         0       14756609
> Gossip messages                 n/a         0         119647
> The last three fields "Large messages" , "Small messages"  and "Gossip messages" keep
changing, "Large messages" has incremented by 2 in last 5 hours, other 2 are changing more
frequently.
> I am unable to figure out whether repair is going on or stuck.. If its stuck.. what should
be my course of action if i want to get that table repaired?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message