cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "vin01 (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4
Date Thu, 19 May 2016 15:56:12 GMT
vin01 created CASSANDRA-11845:
---------------------------------

             Summary: Hanging repair in cassandra 2.2.4
                 Key: CASSANDRA-11845
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11845
             Project: Cassandra
          Issue Type: Bug
          Components: Streaming and Messaging
         Environment: Centos 6
            Reporter: vin01
            Priority: Minor


So after increasing the streaming_timeout_in_ms value to 3 hours, i was able to avoid the
socketTimeout errors i was getting earlier (https://issues.apache.org/jira/browse/CASSANDRA-11826),
but now the issue is repair just stays stuck.

current status :-

[2016-05-19 05:52:50,835] Repair session a0e590e1-1d99-11e6-9d63-b717b380ffdd for range (-3309358208555432808,-3279958773585646585]
finished (progress: 54%)
[2016-05-19 05:53:09,446] Repair session a0e590e3-1d99-11e6-9d63-b717b380ffdd for range (8149151263857514385,8181801084802729407]
finished (progress: 55%)
[2016-05-19 05:53:13,808] Repair session a0e5b7f1-1d99-11e6-9d63-b717b380ffdd for range (3372779397996730299,3381236471688156773]
finished (progress: 55%)
[2016-05-19 05:53:27,543] Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd for range (-4182952858113330342,-4157904914928848809]
finished (progress: 55%)
[2016-05-19 05:53:41,128] Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range (6499366179019889198,6523760493740195344]
finished (progress: 55%)


And its 10:46:25 Now, almost 5 hours since it has been stuck right there.

Earlier i could see repair session going on in system.log but there are no logs coming in
right now, all i get in logs is regular index summary redistribution logs.


Last logs for repair i saw in logs :-

INFO  [RepairJobTask:5] 2016-05-19 05:53:41,125 RepairJob.java:152 - [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd]
TABLE_NAME is fully synced
INFO  [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairSession.java:279 - [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd]
Session completed successfully
INFO  [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairRunnable.java:232 - Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd
for range (6499366179019889198,6523760493740195344] finished

Its an incremental repair, and in "nodetool netstats" output i can see logs like :-



Repair e3055fb0-1d9d-11e6-9d63-b717b380ffdd
    /192.168.100.138
        Receiving 8 files, 1093461 bytes total. Already received 8 files, 1093461 bytes total
            /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80872-big-Data.db
399475/399475 bytes(100%) received from idx:0/192.168.100.138
            /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80879-big-Data.db
53809/53809 bytes(100%) received from idx:0/192.168.100.138
            /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80878-big-Data.db
89955/89955 bytes(100%) received from idx:0/192.168.100.138
            /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80881-big-Data.db
168790/168790 bytes(100%) received from idx:0/192.168.100.138
            /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80886-big-Data.db
107785/107785 bytes(100%) received from idx:0/192.168.100.138
            /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80880-big-Data.db
52889/52889 bytes(100%) received from idx:0/192.168.100.138
            /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80884-big-Data.db
148882/148882 bytes(100%) received from idx:0/192.168.100.138
            /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80883-big-Data.db
71876/71876 bytes(100%) received from idx:0/192.168.100.138
        Sending 5 files, 863321 bytes total. Already sent 5 files, 863321 bytes total
            /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db
161895/161895 bytes(100%) sent to idx:0/192.168.100.138
            /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db
399865/399865 bytes(100%) sent to idx:0/192.168.100.138
            /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db
149066/149066 bytes(100%) sent to idx:0/192.168.100.138
            /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/la-72682-big-Data.db
126000/126000 bytes(100%) sent to idx:0/192.168.100.138
            /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/la-73173-big-Data.db
26495/26495 bytes(100%) sent to idx:0/192.168.100.138
Repair c0c8af20-1d9c-11e6-9d63-b717b380ffdd
    /192.168.100.147
        Receiving 11 files, 13896288 bytes total. Already received 11 files, 13896288 bytes
total
            /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79186-big-Data.db
1598874/1598874 bytes(100%) received from idx:0/192.168.100.147
            /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79196-big-Data.db
736365/736365 bytes(100%) received from idx:0/192.168.100.147
            /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79197-big-Data.db
326558/326558 bytes(100%) received from idx:0/192.168.100.147
            /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79187-big-Data.db
1484827/1484827 bytes(100%) received from idx:0/192.168.100.147
            /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79180-big-Data.db
393636/393636 bytes(100%) received from idx:0/192.168.100.147
            /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79184-big-Data.db
825459/825459 bytes(100%) received from idx:0/192.168.100.147
            /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79188-big-Data.db
3568782/3568782 bytes(100%) received from idx:0/192.168.100.147
            /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79182-big-Data.db
271222/271222 bytes(100%) received from idx:0/192.168.100.147
            /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79193-big-Data.db
4315497/4315497 bytes(100%) received from idx:0/192.168.100.147
            /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79183-big-Data.db
19775/19775 bytes(100%) received from idx:0/192.168.100.147
            /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79192-big-Data.db
355293/355293 bytes(100%) received from idx:0/192.168.100.147
        Sending 5 files, 9444101 bytes total. Already sent 5 files, 9444101 bytes total
            /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db
1796825/1796825 bytes(100%) sent to idx:0/192.168.100.147
            /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db
4549996/4549996 bytes(100%) sent to idx:0/192.168.100.147
            /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db
1658881/1658881 bytes(100%) sent to idx:0/192.168.100.147
            /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/la-72682-big-Data.db
1418335/1418335 bytes(100%) sent to idx:0/192.168.100.147
            /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/la-73173-big-Data.db
20064/20064 bytes(100%) sent to idx:0/192.168.100.147
Read Repair Statistics:
Attempted: 1142
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name                    Active   Pending      Completed
Large messages                  n/a         0            779
Small messages                  n/a         0       14756609
Gossip messages                 n/a         0         119647

The last three fields "Large messages" , "Small messages"  and "Gossip messages" keep changing,
"Large messages" has incremented by 2 in last 5 hours, other 2 are changing more frequently.

I am unable to figure out whether repair is going on or stuck.. If its stuck.. what should
be my course of action if i want to get that table repaired?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message